Page MenuHomePhabricator

log id missing from mediawiki_private_cu_log?
Closed, InvalidPublic

Description

Context: As part of WE 1.5.4, we are defining a Monthly Active Moderators metric. This metric will count the number of users, per wiki, who perform x number of moderator actions during a given month. As part of this work, we need to determine whether moderator actions were performed on desktop vs mobile.

One of the proposed moderator actions to be used as a signal for this metric is "Running a check (via checkuser)". So we need to be able to query change tags associated with specific CheckUser checks, in order to see whether or not the check was done on mobile or not. To do this, I need to join mediawiki_private_cu_log with mediawiki_logging, so that I can then join with the change_tag table.

Problem: I don't see a foreign key in mediawiki_private_cu_log that allows me to join with mediawiki_logging. Is it possible that a log id field (like cule_log_id which is available in the cu_log_event_table) is missing from mediawiki_private_cu_log?

Event Timeline

Moving over from the conversation on Slack:

I think the key fact here is that cu_log_event and cu_log serve very different purposes despite the confusing similarity in their names.

cu_log_event stores the IP and user-agent used to perform a log event (e.g. if I move a page, my IP and user-agent get stored) so that checkusers can use it for sockpuppet investigations. Since this is a supplement the information in logging which details what happened in the log event, naturally it has the foreign key to join with logging.

However, cu_log records when checkusers use their privileges (e.g. if you are a checkuser and look up my IP, that event gets stored in cu_log) so that those actions can be scrutinized if necessary. The information is stored in cu_log instead of in the logging table, so the rows here don't correspond to any rows in logging and therefore there is no foreign key.

Interestingly, revealing the IPs of temp accounts apparently is stored in the logging with log_type = 'checkuser-temporary-account'. But that strengthens my point, because there is no similar log_type='checkuser' that you could join cu_log to.

Can you please associate one or more active project tags with this task (via the Add Action...Change Project Tags dropdown)? That will allow to see a task when looking at project workboards or searching for tasks in certain projects, and get notified about a task when watching a related project tag. Thanks!

Dreamy_Jazz closed this task as Invalid.EditedMar 10 2026, 5:30 PM
Dreamy_Jazz subscribed.

Moving over from the conversation on Slack:

I think the key fact here is that cu_log_event and cu_log serve very different purposes despite the confusing similarity in their names.

Sure. Documentation for these tables (https://www.mediawiki.org/wiki/Extension:CheckUser/cu_log_table and https://www.mediawiki.org/wiki/Extension:CheckUser/cu_log_event_table) should make this distinction clearer, but naming things is always hard :D

cu_log_event stores the IP and user-agent used to perform a log event (e.g. if I move a page, my IP and user-agent get stored) so that checkusers can use it for sockpuppet investigations. Since this is a supplement the information in logging which details what happened in the log event, naturally it has the foreign key to join with logging.

Yes, a few years ago we created this table to migrate entries from cu_changes to a table where we would instead rely on the logging table to store the info

However, cu_log records when checkusers use their privileges (e.g. if you are a checkuser and look up my IP, that event gets stored in cu_log) so that those actions can be scrutinized if necessary. The information is stored in cu_log instead of in the logging table, so the rows here don't correspond to any rows in logging and therefore there is no foreign key.

Interestingly, revealing the IPs of temp accounts apparently is stored in the logging with log_type = 'checkuser-temporary-account'. But that strengthens my point, because there is no similar log_type='checkuser' that you could join cu_log to.

Yes, no cu_log row has any relation to a logging row. See T309999: Convert Special:CheckUserLog to use Special:Log which would migrate the entries to the logging table, primarily because the cu_log table does not have any tags or other features that Special:Log provides.

Therefore, there is no change_tag relationship for any rows in cu_log


Closing this as invalid, as from what I can see it this is based on a misunderstanding of the schema for the tables and that it would require adding change_tag support first to CheckUser logs (which I feel like would be better served by making these actual logging rows that then solves this issue without any further work)

It would be helpful for Product Safety and Integrity to know why this is needed if there are any outstanding questions or if you need additional data (if this isn't public feel free to post to #talk-to-safety-and-security on Slack)

Having done some searching, I'm guessing that T419434: [SPIKE] Mobile tags for log actions is the the motivation behind this?

Based on that, I can give some additional info which may be helpful:

  • We do not store whether a user was using a mobile or desktop device for actions that result in a cu_log row being created.
    • If this is a need for future actions, then instrumentation for loads of these pages would probably be quicker to implement than the database migration of cu_log rows to the logging table
  • The web request logs for POST requests to Special:CheckUser should give you a good approximation of this data, as generally one POST request corresponds to one cu_log row. However, that is not easy to query and would have more pitfalls than checking the tags on a log entry
CMyrick-WMF updated the task description. (Show Details)
CMyrick-WMF updated the task description. (Show Details)

Thanks so much for all the info @Dreamy_Jazz. I've added more context to the description, so that my use case and its connection to OKR work is clearer.

Just to be clear,

  • We do not store whether a user was using a mobile or desktop device for actions that result in a cu_log row being created.

Does this mean (1) we do not store whether the user running the check did so via mobile or desktop, and/or (2) these checks do not receive a "mobile" or "mobile edit" change tag when performed on mobile?

Does this mean (1) we do not store whether the user running the check did so via mobile or desktop, and/or (2) these checks do not receive a "mobile" or "mobile edit" change tag when performed on mobile?

Yes, both of these are the case.

Essentially we have no data on whether the user running the check was using a mobile device or desktop device. The only way to get this data that I can think of is the web request logs or for a small subset of the recent rows in cu_log the /analytics/legacy/specialinvestigate/1.1.0 eventlogging schema may have this data(?)

For any large scale collection of this data, we would need to migrate cu_log to the logging table which is a very large technical effort

If a general overview of desktop vs mobile is useful, a vast majority of the rows in cu_log will have been made by users either using a desktop device, as the tools are not well optimised for mobile devices. Of those using mobile, most will be using the desktop view

Can you please associate one or more active project tags with this task (via the Add Action...Change Project Tags dropdown)? That will allow to see a task when looking at project workboards or searching for tasks in certain projects, and get notified about a task when watching a related project tag. Thanks!

Definitely! Done.

If a general overview of desktop vs mobile is useful, a vast majority of the rows in cu_log will have been made by users either using a desktop device, as the tools are not well optimised for mobile devices. Of those using mobile, most will be using the desktop view

Yes, we figured that would be the case. However, we'd like to get a baseline, so that as (potential) tool improvements are made, we can track the number of users doing checks on mobile.

Does this mean (1) we do not store whether the user running the check did so via mobile or desktop, and/or (2) these checks do not receive a "mobile" or "mobile edit" change tag when performed on mobile?

Yes, both of these are the case.

Essentially we have no data on whether the user running the check was using a mobile device or desktop device. The only way to get this data that I can think of is the web request logs or for a small subset of the recent rows in cu_log the /analytics/legacy/specialinvestigate/1.1.0 eventlogging schema may have this data(?)

For any large scale collection of this data, we would need to migrate cu_log to the logging table which is a very large technical effort

Understood. Thank you for the explanations!

However, we'd like to get a baseline, so that as (potential) tool improvements are made, we can track the number of users doing checks on mobile.

Sure. For generating this data, I would recommend that this task be re-written as a feature request to add/update instrumentation (and then reopen this task too). This instrumentation would indicate whether mobile devices are being used to run the check

Implementing that instrumentation would be a lot easier than adding change_tag support (change_tag support for cu_log probably requires it's own OKR hypothesis to do properly).


I am not sure that Product Safety and Integrity will have the time to add this instrumentation, so if this is something you would need Product Safety and Integrity to do I'd recommend coordinating with @OKryva-WMF, @EMill-WMF and @Rsilvola to work how this might fit in to Product Safety and Integrity work.