Page MenuHomePhabricator

Fix misattribution of block due to bad values in the ipblocks ipb_by_actor field
Closed, ResolvedPublicSecurity

Description

NOTE: Filling as a security issue, because it makes the issue described in T260485 (which is still private) obvious. Feel free to publish once the other task is published.

In T260485, Urbanecm noticed CentralAuth uses wrong ipb_by_actor values. That broke the block in some cases, if no corresponding row exists in the actor table. See T261325: Fix rows in ipblocks that point to a non-existing user in ipb_by_actor field, due to T260485 for fixing that.

IMPACT: As of now (2020-08-24), the issue is fixed in Wikimedia production, but there is still a lot (thousands?) of misattributed blocks. This means that when looking at Special:Special:BlockList, the "blocking admin" column would indicate a user who did not perform the block. This may cause confusion, especially in controversial cases.

We need to find a reliable way to fix them all, possibly via a maintenance script ran in someone's home, to not clutter git history more than necessary?

In case an actor row exists, it is however pretty hard to find the cases where it is the incorrect actor, and who the true actor is. If we know a row is wrong, we can try to find an entry in ipblocks on meta with the same ipb_address and a timestamp close to the local one. But how can we know whether the rows is wrong? Perhaps we could check whether the indicated actor even has (or ever had) the rights to perform the block. If not, the attribution must be wrong.

Details

Author Affiliation
Wikimedia Communities

Event Timeline

Aklapper renamed this task from Fix incosistent ipblocks ipb_by_actor rows to Fix inconsistent ipblocks ipb_by_actor rows.Aug 25 2020, 7:59 AM

https://login.wikimedia.org/wiki/Special:BlockList can give you an idea of the size of the issue I guess. Check for blocks not done by Meta>xxx users (except those done locally using the suppress script while T260485 remained unfixed).

Having looked at a few databases, the problem seems quite extensive: T260485#6389423 (loginwiki had >800 affected blocks)

I like the idea of using a fake actor instead of trying to find out the original blocker, which sounds like a lot of effort.

We need to find a reliable way to fix them all, possibly via a maintenance script ran in someone's home, to not clutter git history more than necessary?

It might be helpful to provide a maintenance script in the repo for this - in theory third parties could have been affected too.

daniel renamed this task from Fix inconsistent ipblocks ipb_by_actor rows to Fix misattribution of block due to bad values in the ipblocks ipb_by_actor field.Aug 26 2020, 4:30 PM
daniel updated the task description. (Show Details)
daniel added a project: Platform Engineering.
daniel subscribed.

I updated the task description, so this task is now about fixing attribution, while T261325 is about fixing broken blocks. Restoring functionality seems a lot easier than finding out which block is attributed to the wrong user...

It might be helpful to provide a maintenance script in the repo for this - in theory third parties could have been affected too.

Sure, we can only put it into the repo *after* the security release, otherwise we'd make the issue public before people had time to fix it.

I imagine you could find the correct actor by using the CentralAuth log on meta?

I imagine you could find the correct actor by using the CentralAuth log on meta?

If, and that's a big if, is CentralAuth the only user of the broken functionality. I don't know, to be honest.

I imagine you could find the correct actor by using the CentralAuth log on meta?

Maybe, but as far as I can see, it's not going to be easy. The trouble is even knowing which rows in ipblocks are wrong. I think we would have would have to:

  • look at every ipblocks entry from the last year, on each wiki. "Somehow" figure out if the rows is suspicious, perhaps be checking group membership of the actor (slow!)
  • check if there is a CentralAuth log entry on meta for the same target user and "around the same time". Unfortunately, the timestamps will not be exactly the same.
  • make sure that log entry doesn't belong to *another* ipblocks entry
  • re-assign the local ipblocks entry to another actor, hoping that we didn't just add to the problem.

If, and that's a big if, is CentralAuth the only user of the broken functionality. I don't know, to be honest.

I made a patch. Care to review? T260485#6415184

I imagine you could find the correct actor by using the CentralAuth log on meta?

Maybe, but as far as I can see, it's not going to be easy. The trouble is even knowing which rows in ipblocks are wrong.

Apparently wrong ipblocks entries can be detected via T260485#6389423, which seems to be reasonably-fast. I wouldn't care much about the rest, mainly because I'm not sure how to detect them. A relatively-fast check would be to check on global groups of the stated actor only for ipblocks with comment matching the CentralAuth-originated block (so-far, the only known occurance of this issue).

[...]

  • check if there is a CentralAuth log entry on meta for the same target user and "around the same time". Unfortunately, the timestamps will not be exactly the same.

We don't need to do that. If it's CentralAuth block, it has a fairly specific comment, and the actor exists at meta, so we can just look there, check it matches our expectation (starts with Meta>) and use it for reverse-actor-searching.

  • make sure that log entry doesn't belong to *another* ipblocks entry
  • re-assign the local ipblocks entry to another actor, hoping that we didn't just add to the problem.

Anti-Harassment is this something that you could triage? How urgent is this?

daniel updated the task description. (Show Details)
eprodromou subscribed.

@AMooney I checked with @Niharika and we think it's a "medium".

I don't understand the difficulty here. The affected ipblocks rows have a distinctive reason (centralauth-admin-suppressreason), which includes the name of the blocking admin. Every cross-wiki suppression in the affected time range will have an incorrect ipb_by_actor, so every row discovered in this way will have to be updated. You can validate the username extracted from the reason by cross-checking it against two other locations where this data is stored:

  • The ipb_by_actor corresponds to a valid actor_id on metawiki which gives the actual admin name
  • The metawiki logging table contains a logging row, which can be identified by correlating by target username, and log_actor is the actual admin username

For example on loginwiki:

MariaDB [loginwiki]> select * from ipblocks where ipb_id=3468\G
*************************** 1. row ***************************
              ipb_id: 3468
         ipb_address: Dem Jivee Blau den Kopf abschneiden!
            ipb_user: 55344425
        ipb_by_actor: 46199926
       ipb_reason_id: 41836
       ipb_timestamp: 20190911222148
            ipb_auto: 0
       ipb_anon_only: 0
  ipb_create_account: 1
ipb_enable_autoblock: 1
          ipb_expiry: infinity
     ipb_range_start: 
       ipb_range_end: 
         ipb_deleted: 1
     ipb_block_email: 1
  ipb_allow_usertalk: 0
 ipb_parent_block_id: NULL
        ipb_sitewide: 1
1 row in set (0.00 sec)

MariaDB [loginwiki]> select * from comment where comment_id=41836\G
*************************** 1. row ***************************
  comment_id: 41836
comment_hash: 1329015040
comment_text: Globally suppressed by Schniggendiller for following reason: Blatant attack user name
comment_data: NULL
1 row in set (0.00 sec)

And the corresponding rows on metawiki:

MariaDB [metawiki]> select * from logging where log_id=32987951\G
*************************** 1. row ***************************
        log_id: 32987951
      log_type: suppress
    log_action: setstatus
 log_timestamp: 20190911222149
 log_namespace: 2
     log_title: Dem_Jivee_Blau_den_Kopf_abschneiden!@global
log_comment_id: 5819
    log_params: locked, suppressed
(none)
   log_deleted: 0
     log_actor: 32044
      log_page: 0
1 row in set (0.00 sec)

MariaDB [metawiki]> select * from actor where actor_id in (32044,46199926)\G
*************************** 1. row ***************************
  actor_id: 32044
actor_user: 154407
actor_name: Schniggendiller
*************************** 2. row ***************************
  actor_id: 46199926
actor_user: NULL
actor_name: Meta>Schniggendiller
2 rows in set (0.00 sec)

The username is suppressed but is not PII, so I don't think there's any privacy problem with quoting it here.

If the message text changes, it's a bit of a nuisance, but we can detect whether that has happened.

I wrote a script to traverse the git history and discover all previous values of a given message, by reading the i18n/*.json files. So that's not a problem. The message has never been customised on meta, and the job runs on meta and so won't see customised messages on other wikis, so i18n/*.json should have all we need.

Fix script: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaMaintenance/+/640348

Proposed bounding timestamps:

  • Start: 2019-09-10
  • End: 2020-08-25

The start is based on the date at which the causative patch was merged to master. The end is based on T260485#6406277.

I ran the script on all wikis. It completed without errors. The problem appears to be fixed.

tstarling changed the visibility from "Custom Policy" to "Public (No Login Required)".Nov 13 2020, 2:43 AM
tstarling changed the edit policy from "Custom Policy" to "All Users".

I removed the custom security policy on the basis that there is no need for it to be restricted now that it is fixed. I reviewed the comments above for PII and found none.

Change 640348 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@master] Scripts for fixing ipb_by_actor

https://gerrit.wikimedia.org/r/640348