Page MenuHomePhabricator

Thanks activities should be logged by CheckUser
Closed, ResolvedPublic

Description

Thanks should call the proper CheckUser hooks so that when users use Thanks, it would be logged by CheckUser.

Note that Thanks logs are not in RecentChanges. But nor are AbuseFilter logs and they are included in CU logs, and are actually very useful.

Justification

Users that are blocked or are actively evading the project may still use Thanks to "silently" interact with other users. This can help us have more recent CU logs for them during this period of inactivity.

  • Demonstrating Example: Using fawiki from 2020-01-01 to date, I was able to find 558 users who used the Thanks feature on a day on which they did not make any edits.
    • For instance, this user used Thanks 20 times on April 2nd, but their last edits and non-Thanks public logs are from 2017.
Considerations

Adding Thanks logs will make cu_changes table larger. We should get an estimate of by how much, considering that cu_changes is already pretty large for largest wikis (such as enwiki or wikidatawiki).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
DannyS712 added a project: DBA.
DannyS712 subscribed.

Tagging DBA for feedback regarding whether its okay to be expanding the cu_changes table

Tagging DBA for feedback regarding whether its okay to be expanding the cu_changes table

See updated task definition. The additional rows may not be that many.

Tagging DBA for feedback regarding whether its okay to be expanding the cu_changes table

See updated task definition. The additional rows may not be that many.

Not sure how large the table currently is, but adding 30,000 rows should probably be looked at by DBA first

Tagging DBA for feedback regarding whether its okay to be expanding the cu_changes table

See updated task definition. The additional rows may not be that many.

Not sure how large the table currently is, but adding 30,000 rows should probably be looked at by DBA first

I am not debating that; just indicating it may be an easy "yes" by DBA.

Also, note that it is not 30K additional rows, but rather 90K (because of retention policy of 3 months).

I am guessing enwiki has tens of millions of rows in cu_changes (given their 5 million edits per month + other logs). For wikidata it is probably in the order of hundreds of millions of rows (given their 20 million edits per month).

In T247540, Reedy indicated that the table has about 60,000,000 rows on wikidata and 10,000,000 on enwiki.

Marostegui subscribed.

From what I can see these are the number of rows on cu_changes:
enwiki: 18305102
commons: 24673959
wikidata: 67864207

So the numbers you provided (ie: 30k, 2k) those are the extra rows per month + and then we have to apply a 3 months retention policy?
After those 3 months we'll clean up those rows?

From what I can see these are the number of rows on cu_changes:
enwiki: 18305102
commons: 24673959
wikidata: 67864207

So the numbers you provided (ie: 30k, 2k) those are the extra rows per month + and then we have to apply a 3 months retention policy?
After those 3 months we'll clean up those rows?

Yes, that is believed to be the db impact of such logging

So as long as we use the normal methods to delete stuff (small batches, wait for replication etc) I think we should be fine.
Considering the sizes of the tables, the % of additional rows (including the retention policy) is 0.5% on enwiki and 0.01% on wikidatawiki.

Let's please make sure the purging works as expected to avoid the table getting out of control.

Change 596998 had a related patch set uploaded (by DannyS712; owner: DannyS712):
[mediawiki/extensions/Thanks@master] ApiThank: Send thanks logs to checkuser when extension is enabled

https://gerrit.wikimedia.org/r/596998

Change 596998 merged by jenkins-bot:
[mediawiki/extensions/Thanks@master] ApiThank: Send thanks logs to CheckUser when extension is enabled

https://gerrit.wikimedia.org/r/596998

So as long as we use the normal methods to delete stuff (small batches, wait for replication etc) I think we should be fine.
Considering the sizes of the tables, the % of additional rows (including the retention policy) is 0.5% on enwiki and 0.01% on wikidatawiki.

Let's please make sure the purging works as expected to avoid the table getting out of control.

Okay, the patch has merged. Should we leave this open for monitoring of the table size for the next 90 days?

Which wikis have it currently enabled?

Which wikis have it currently enabled?

All wikis with CU and thanks - there is no feature flag

I just used User:T255506 to thank myself on fawiki and then I ran a check on that user. The check returned no results for the past 7 days.

Looking at Special:Version on fawiki, Thanks is on 895bb0b99d1b940 which is the June 29th branch. The patch was merged on July 3rd so it has not reached the Wikipedias. Let's keep this open until we verify it worked on the Wikipedias. After that, I think we should close this but open a new task for monitoring DB size. We should define the definition of done for that new task. Are we going to monitor indefinitely? for a few weeks? months? Are we going to take specific actions if we notice the DB size is growing too fast (which is unlikely)? How often are we going to pull row counts from the DB? Are we going to look at total row count, or just count of rows from Thanks? They all need to be define in the new task. @DannyS712 do you want to create it?

I just used User:T255506 to thank myself on fawiki and then I ran a check on that user. The check returned no results for the past 7 days.

Looking at Special:Version on fawiki, Thanks is on 895bb0b99d1b940 which is the June 29th branch. The patch was merged on July 3rd so it has not reached the Wikipedias. Let's keep this open until we verify it worked on the Wikipedias. After that, I think we should close this but open a new task for monitoring DB size. We should define the definition of done for that new task. Are we going to monitor indefinitely? for a few weeks? months? Are we going to take specific actions if we notice the DB size is growing too fast (which is unlikely)? How often are we going to pull row counts from the DB? Are we going to look at total row count, or just count of rows from Thanks? They all need to be define in the new task. @DannyS712 do you want to create it?

I'll leave that decision to @Marostegui

I would like to see how much the tables grow per week on the biggest wikis (enwiki, commons, wikidata)...for around 1 month since it gets fully deployed. We can check those % from the backups.

@DannyS712 one more thought: should we do a User-notice for this change?

@DannyS712 one more thought: should we do a User-notice for this change?

Probably

I'm confused. Isn't Thanks public anyway? What will CheckUsers be able to see that they couldn't before?

The Thanks log entries are public, but the UA and IP of the user are not being made available in CU like happens with other logs in MW core.

That makes more sense. Thanks!

I'm confused. Isn't Thanks public anyway? What will CheckUsers be able to see that they couldn't before?

The Thanks log entries are public, but the UA and IP of the user are not being made available in CU like happens with other logs in MW core.

I just asked/ordered my bot to thank me on fawiki and then ran a check on it, and can confirm that the thank action was in the CU results.

Also, the current thanks log only shows who thanked who. Why does it not list which edit/logaction was thanked for?

@Naleksuh that was never a public log. I think adding that feature requires approval not just by WMF teams (Legal and T&S) but also by the communities. I, for one, think that is not necessary information for CUs to know and would oppose it. As far as CU is concerned, just having the IP and UA associated with the Thank event should suffice.