Page MenuHomePhabricator

Monitor the growth of CheckUser tables thanks to the addition of Thanks data
Closed, ResolvedPublic

Description

T252226 will be reaching the Wikipedias in about a week. We would like to monitor the impact of this additional data on the size of cu_changes tables on the large wikis.

To quote @Marostegui from T252226#6282014: we would like to see how much the tables grow per week on the biggest wikis (enwiki, commons, wikidata) for around 1 month since it gets fully deployed.

The DBA team can check those % from the backups.

UPDATE
For easier access to the data, this table will be updated every time new data is retrieved and shared by the DBA team. Sizes are presented: "Compressed / Uncompressed"

wiki2020-07-072020-07-142020-07-212020-07-28
enwiki1.4G / 5.9G1.4G / 5.8G1.4G / 5.7G1.4G / 5.7G
commonswiki1.1G / 7.9G1.1G / 7.8G1.1G / 7.5G1.1G / 7.3G
wikidatawiki2.5G / 21.8G2.5G / 21.9G2.5G / 21.8G2.5G / 21.6G

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Huji changed the task status from Open to Stalled.Jul 6 2020, 3:59 PM

Stalled until T252226 actually reaches production servers.

This would be a good time for the DBA team to come up with strategies on how to run the queries, who runs the queries, etc.

Current sizes as of 7th July:

wikidata

-rw-r--r-- 1 dump dump 644M Jul  7 06:21 dump.s8.2020-07-07--05-55-29/wikidatawiki.cu_changes.00000.sql.gz
-rw-r--r-- 1 dump dump 655M Jul  7 06:21 dump.s8.2020-07-07--05-55-29/wikidatawiki.cu_changes.00001.sql.gz
-rw-r--r-- 1 dump dump 649M Jul  7 06:21 dump.s8.2020-07-07--05-55-29/wikidatawiki.cu_changes.00002.sql.gz
-rw-r--r-- 1 dump dump 638M Jul  7 06:22 dump.s8.2020-07-07--05-55-29/wikidatawiki.cu_changes.00003.sql.gz

commonswiki

-rw-r--r-- 1 dump dump 1.1G Jul  7 02:25 dump.s4.2020-07-07--02-00-08/commonswiki.cu_changes.00000.sql.gz

enwiki

-rw-r--r-- 1 dump dump 1.4G Jul  7 02:57 dump.s1.2020-07-07--02-30-21/enwiki.cu_changes.sql.gz

@Huji Is there a similar task for tracking table bloat from login attempts data?

@Niharika not yet. We are still two steps away from that data getting into CU tables (step one, step two). Also, that feature will be initially turned on for a small subset of wikis, so we will have to do a more targeted evaluation of DB size growth. Lastly, we don't want that feature to be enabled within the next 2-3 weeks (not that it would, anyway) because those select wikis will then have two features enabled in close proximity in time, and monitoring their data growth will cause confusion.

@Niharika not yet. We are still two steps away from that data getting into CU tables (step one, step two). Also, that feature will be initially turned on for a small subset of wikis, so we will have to do a more targeted evaluation of DB size growth. Lastly, we don't want that feature to be enabled within the next 2-3 weeks (not that it would, anyway) because those select wikis will then have two features enabled in close proximity in time, and monitoring their data growth will cause confusion.

Got it. Thank you for clarifying.

Huji changed the task status from Stalled to Open.Jul 13 2020, 1:16 PM

Not stalled anymore, since the feature created in T255526 is now live on all WMF wikis.

@Marostegui can I ask you to provide an update every Tuesday?

14th July:

-rw-r--r-- 1 dump dump 1.1G Jul 14 00:46 dump.s4.2020-07-14--00-20-39/commonswiki.cu_changes.00000.sql.gz


-rw-r--r-- 1 dump dump 640M Jul 14 03:56 dump.s8.2020-07-14--03-29-39/wikidatawiki.cu_changes.00000.sql.gz
-rw-r--r-- 1 dump dump 654M Jul 14 03:55 dump.s8.2020-07-14--03-29-39/wikidatawiki.cu_changes.00001.sql.gz
-rw-r--r-- 1 dump dump 645M Jul 14 03:56 dump.s8.2020-07-14--03-29-39/wikidatawiki.cu_changes.00002.sql.gz
-rw-r--r-- 1 dump dump 642M Jul 14 03:55 dump.s8.2020-07-14--03-29-39/wikidatawiki.cu_changes.00003.sql.gz



-rw-r--r-- 1 dump dump 1.4G Jul 14 01:18 dump.s1.2020-07-14--00-52-21/enwiki.cu_changes.sql.gz

21st July:

-rw-r--r-- 1 dump dump 1.1G Jul 21 00:26 dump.s4.2020-07-21--00-00-01/commonswiki.cu_changes.00000.sql.gz


-rw-r--r-- 1 dump dump 632M Jul 21 00:25 dump.s8.2020-07-21--00-00-01/wikidatawiki.cu_changes.00000.sql.gz
-rw-r--r-- 1 dump dump 645M Jul 21 00:26 dump.s8.2020-07-21--00-00-01/wikidatawiki.cu_changes.00001.sql.gz
-rw-r--r-- 1 dump dump 635M Jul 21 00:26 dump.s8.2020-07-21--00-00-01/wikidatawiki.cu_changes.00002.sql.gz
-rw-r--r-- 1 dump dump 642M Jul 21 00:26 dump.s8.2020-07-21--00-00-01/wikidatawiki.cu_changes.00003.sql.gz

-rw-r--r-- 1 dump dump 1.4G Jul 21 01:15 dump.s1.2020-07-21--00-49-29/enwiki.cu_changes.sql.gz
Huji updated the task description. (Show Details)

@Marostegui is it okay that we are looking at the GZipped file size? Could there be an edge case where the unzipped file size has grown significantly, but GZip has done such a fantastic job with the compression that the compressed file size had not changed by much?

We have some alerts on the backups that measure the delta between weekly backups and that has been working fine lately, so I wouldn't expect that to be an issue here. If we want to be double sure, I can also take a look at the uncompressed files.

Niharika renamed this task from Monitor the growth of CheckUser tables after the addition of Thanks data to Monitor the growth of CheckUser tables thanks to the addition of Thanks data.Jul 26 2020, 7:21 PM

Couldn't help it.

Oh well. When we get to the similar task that has to do with the addition of the login logs to the CU logs, I will make sure to call you for help with naming the task ;)

Just wanted to say thanks to everyone for the hard work

@Marostegui assuming that numbers you pull tomorrow are not drastically different than the last 3 weeks, we could mark this task as resolved; after all, our plan was to monitor this for about one month.

Marostegui updated the task description. (Show Details)

I have added the results for today. They are pretty much the same as we've got for the past 4 weeks.
This seems stable, so resolving this
Thanks everyone!

@Marostegui am I reading this right that none of the tables' compressed sizes changed, and other than 07-07->07-14 on wikidata, every week saw either a slight shrink or no change to the uncompressed size? I.e. the thanks data made basically no impact?

Correct, no noticiable impact on disk size

Thanks everyone!

I bet @Niharika would have appreciated it if you had written that as "Thanks everyone!" ;)

But seriously, thank you for taking the lead on this.