Page MenuHomePhabricator

Suppressed username shown on File pages
Open, MediumPublic6 Estimated Story Points

Description

When a user is blocked with username suppression, the username is suppressed for old revisions of a file, but the current version has the (suppressed) username shown.

To reproduce,

  • Upload an image as a user
  • Upload a second version of the image
  • Login as an admin and block the uploading, selecting "Hide username from edits and lists"
  • As a logged out user, view the File page. The username is still visible on the current revision, but not for old revisions.

Example of affected file with a single revision: Schachboxen1.jpg
Example of affected file with multiple revisions: Toolse_linnuse_varemed.jpg

Patches

  • - add image.img_deleted, populate it from ipblocks, update oldimage.oi_deleted/filearchive.fa_deleted based on ipblocks to account for past image -> oldimage/filearchive transitions
  • - support revdel flags on current versions of images
  • - invalidate image cache when needed
  • - trigger cache invalidation from CentralAuth global locks
  • - make image table dump private (patch made against ariel branch of operations/dumps
  • - hide username in Labs replicas (patch made against operations/software)

One-time script files for WMF cluster update:

T90300-add-img_deleted.sql:

ALTER TABLE image ADD img_deleted tinyint unsigned NOT NULL default 0;

T90300-populate-img_deleted.sql:

UPDATE image JOIN ipblocks ON img_user_text = ipb_address
SET img_deleted = img_deleted | 12
WHERE ipb_deleted;

UPDATE oldimage JOIN ipblocks ON oi_user_text = ipb_address
SET oi_deleted = oi_deleted | 12
WHERE ipb_deleted;

UPDATE filearchive JOIN ipblocks ON fa_user_text = ipb_address
SET fa_deleted = fa_deleted | 12
WHERE ipb_deleted;

WMF cluster update plan

  1. run foreachdb sql.php /home/tgr/T90300-add-img_deleted.sql
  2. apply all patches
  3. run foreachdb sql.php /home/tgr/T90300-populate-img_deleted.sql
  4. run script via foreachwiki eval.php to invalidate cache (if we care enough):
$dbr = wfGetDB( DB_SLAVE ); $res = $dbr->select( 'image', LocalFile::selectFields(), array( $dbr->bitAnd( 'img_deleted', 12 ) . ' = 12' ), 'T90300' ); foreach ( $res as $row ) { RepoGroup::singleton()->getLocalRepo()->newFileFromRow( $row )->invalidateCache(); }

Notices

  • removes public image table dumps (image table will still be available via Tool Labs)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
In T90300#1110502, @Tgr wrote:

Seems sane enough. Is there an associated change set in gerrit?

No; I assumed since this is a security bug, it should only go to gerrit once it has been deployed on the cluster.

Correct, at least in general.

it should still wait for slaves between each wiki (so that shard masters with multiple wikis don't accumulate troublesome replication lag).

How do I do that?

wfWaitForSlaves()?

Tgr added a comment.Mar 11 2015, 8:31 PM

wfWaitForSlaves()?

Thanks! I take it that means there is no pure-SQL way of doing it?

Tgr updated the task description. (Show Details)Mar 16 2015, 8:02 PM
Tgr added a subscriber: coren.Mar 16 2015, 8:05 PM

Apparently Labs uses a completely different setup for replication; added another patch. Adding @coren for review.

The good news is that Labs tools won't be completely broken then (only as far as they can't handle NULL in username fields), only dump-based tools, which is probably not that many these days.

coren added a comment.Mar 16 2015, 8:16 PM

Thankfully, Labs tools are (should be) able to handle NULL in username fields as this is also a normal effect of supression in edits.

coren added a comment.Mar 16 2015, 8:49 PM

Also note that this is similar to, and related to T91706 where the issue is the username in the user table itself (also from block-and-hide). No opinion on whether this should be one fix to rule them all.

Tgr added a comment.Mar 16 2015, 8:52 PM

wfWaitForSlaves()?

sql.php does that already, nice!

Tgr updated the task description. (Show Details)Mar 16 2015, 8:58 PM
coren added a comment.Mar 16 2015, 9:15 PM

The patch to maintain-replicas has my +1. Please poke me when you want it applied or - alternately - anyone can do so from root on labstore1001 (the operations/software tree is cloned into root's home):

# ./maintain-replicas.pl -u image

(-u image tells the tool to forcibly recreate the view in all databases where it exists).

csteipp added a parent task: Restricted Task.Mar 19 2015, 9:28 PM

@Tgr, sorry for the slow review!

didn't account for table prefixes for a few of the tables-- I made a new version, and rebased it on master.

In testing this, I'm seeing something odd with the File: page and caching. When the same user uploads multiple versions of a file, then I block + hide that user, on the File: page, the old versions of the file immediately have the correct "(username removed)" for the username, but the current revision still has the username visible. I can sometimes add purge=1 to the url to get it to correctly hide the username, but sometimes I have to flush memcache.

So it seems like the archived versions of the file are flushing the File object out of cache, but the current version isn't. Any idea how to fix that?

csteipp reassigned this task from csteipp to Tgr.May 11 2015, 11:47 PM
csteipp triaged this task as Medium priority.
csteipp added a project: Security-Team.
csteipp moved this task from Incoming to In Progress on the Security-Team board.
Tgr added a comment.May 12 2015, 8:27 AM

I don't think old versions are cached at all (OldLocalFile::getCacheKey() should disable that). I don't see an easy fix - RevisionDeleteUser uses mass DB updates to set the flags, and there is no equivalent for the cache, so the only thing I can think of is a job that iterates through all images of the user and invalidates them. (Which means that for global suppression, a job will iterate through the wikis and fire off a thousand other jobs which iterate through the images... not sure if that would cause any complications. If it's problematic, we could check if the DB update touched any rows and only fire the job when needed.)

@aaron, any ideas on how to deal with this caching issue?

aaron added a comment.May 21 2015, 6:31 PM

Why not just have RevisionDeleteUser iterate over the image rows to be changed and call invalidateCache()? That should be able to handle a fair number of edits (and it's not likely such users will have many files). AFAIK, the doLocalSuppression call can already run in jobs. If we just removed $wgCentralAuthWikisPerSuppressJob and always used the queue for simplicity, and pushed the jobs to the local wikis rather than the central queue, the cache purges would work. CentralAuthSuppressUserJob looks like it would work on local wikis.

I made https://gerrit.wikimedia.org/r/212579 to speed up such purges a bit.

Tgr updated the task description. (Show Details)Jun 2 2015, 6:54 PM
Tgr added a comment.Jun 3 2015, 12:42 AM

Did that, don't really like the result:


@aaron, couldn't we expose the shared repo cache key to other wikis instead?

Tgr updated the task description. (Show Details)Jun 3 2015, 1:21 AM

Fixed a stupid mistake in the Labs replication patch.

@Tgr, I applied

, which doesn't look like it fixed the issue for local suppression blocks. Is there a local patch similar to the centralauth patch, so that the cache is invalidated? Wasn't sure if I missed something or if you're still working on it.

Tgr added a comment.Jun 13 2015, 12:02 AM

How do you trigger a local suppression block? RevisionDeleteUser::invalidateCache() needs to be called, and right now that's the responsibility of the code that calls the other RevisionDeleteUser methods (and I only did it for CentralAuth). Maybe it should be called by those methods instead; I didn't want to do that because those work when called cross-wiki and invalidateCache does not and that would have been confusing.

Just do a normal block on a username, and there's an option to also
suppress the username, if you have the right (iirc).

Tgr updated the task description. (Show Details)Jun 25 2015, 5:01 AM
Restricted Application added a project: Cloud-Services. · View Herald TranscriptJun 25 2015, 5:01 AM
Tgr added a comment.Jun 25 2015, 5:03 AM

@csteipp: updated the patch (it's in the task summary). Looks like undoing a suppress block via Special:Block is currently broken, the patch fixes that as well.

aaron added a comment.Jun 26 2015, 7:17 AM
In T90300#1332550, @Tgr wrote:

Did that, don't really like the result:


@aaron, couldn't we expose the shared repo cache key to other wikis instead?

Yes, though I think we should be careful about cross-wiki code and the existing pattern seems a bit like a mistake. It's already there though and building on it is not intolerable though. In any case it should either fully work crosswiki or not pretend to.

Tgr updated the task description. (Show Details)Jun 26 2015, 10:23 PM

There was a stray comma in the sql patchfile.

@Tgr, thanks for the new patches, I confirmed the issue is fixed now!

I'm a little worried about the move to only using jobs in centralauth-- @aaron, do those run in their own queue so we can be sure that they're going to get run soon after the user is hidden?

If so, I'm happy to go ahead and deploy these.

@Tgr, thanks for the new patches, I confirmed the issue is fixed now!

I'm a little worried about the move to only using jobs in centralauth-- @aaron, do those run in their own queue so we can be sure that they're going to get run soon after the user is hidden?

If so, I'm happy to go ahead and deploy these.

They don't, just the regular high-priority queue. It might makes sense to leave it that way unless it's too slow. If we have slow jobs in the high priority queue that back up others, then they should be moved the low priority queue rather than making new queue loops, generally speaking.

@jcrespo / @Springle,

This all looks good from the security side, I'd like to start getting the fixes rolled out. Tgr outlined the task description, but basically,

WMF cluster update plan

  1. run foreachdb sql.php /home/tgr/T90300-add-img_deleted.sql
  2. apply all patches
  3. run foreachdb sql.php /home/tgr/T90300-populate-img_deleted.sql
  4. run script via foreachwiki eval.php to invalidate cache (if we care enough):

Can we start working on #1?

jcrespo added a comment.EditedJul 29 2015, 7:01 PM

Unless @Springle is against it, I am ok with the intended work, I would be against the method, specially for commons which has 36 million rows on image and that would disable file uploads for over 1 hour, plus create a 1 hour lag on the slaves. For other wikis with no images or a small number, it is ok (<1000 rows).

I would recommend using pt-online-schema change for the larger tables (this table has a PRIMARY KEY). We can take care of this, with the usual questions (where and when?).

For the updates, make sure they are done in <1 sec steps, perform several of them otherwise.

Re: query optimization, there is a big difference between the IN () and the JOIN version (apparently MySQL is really stupid about handling uncorrelated IN subqueries), so using joins and the (both faster and more correct) username-based joins:

This is fixed on 5.6/10. We recently dumped 5.5 on the slaves, and will do on the masters soon.

csteipp moved this task from In Progress to Waiting on the Security-Team board.Jul 31 2015, 6:00 PM

Unless @Springle is against it, I am ok with the intended work, I would be against the method, specially for commons which has 36 million rows on image and that would disable file uploads for over 1 hour, plus create a 1 hour lag on the slaves. For other wikis with no images or a small number, it is ok (<1000 rows).

I would recommend using pt-online-schema change for the larger tables (this table has a PRIMARY KEY). We can take care of this, with the usual questions (where and when?).

@jcrespo, I'd be happy for you or Sean to do this. It would need to be applied to all wikis, whenever you're able to schedule it.

For the updates, make sure they are done in <1 sec steps, perform several of them otherwise.

I'm not exactly clear on what you're suggesting here-- are you saying setup a limit and repeatedly run this where each run finishes under 1 second? Something like running,

UPDATE image JOIN ipblocks ON img_user_text = ipb_address
SET img_deleted = img_deleted | 12
WHERE ipb_deleted AND img_deleted = 0
LIMIT <something sane that finishes under 1 second>;

Will apply the change, report back when finished.

I'm not exactly clear on what you're suggesting here-- are you saying setup a limit and repeatedly run this where each run finishes under 1 second? Something like running,

UPDATE image JOIN ipblocks ON img_user_text = ipb_address
SET img_deleted = img_deleted | 12
WHERE ipb_deleted AND img_deleted = 0
LIMIT <something sane that finishes under 1 second>;

That would probably work. If you want to be 100% sure of changes being applied in less than 1 second, page using the primary key (LIMIT could take sometimes too much time if a full table scan or large range is done). Take a 1 second pause between updates or monitor the lag during its application. If that is too difficult, ask for help to the DBAs after code is applied.

@csteipp I've deployed it to testwiki. Sincerely, this seems like a very strightforward change (except for large tables), but if you want to do a final check or test my previous comment suggestions there (for example, 1000 rows updated at a time is a good starting point), I will wait a day before applying it to the rest of the wikis.

Deployment update:

The schema change was applied to all wikis in an online fashion (pt-table-checksum), with the exception of commonswiki. The reason for that is that due to the high SELECT rate of that table on commons, the change created high contention due to metadata locking, so I stopped and rolled back the change. Please note that the issue here is the schema change, not the resulting schema.

I am now applying the change in a rolling-update fashion, depooling and repooling one server at a time -hopefully that will avoid the load issues. The issue with this method is that the change takes between 2-4 hours per server and has to be mostly serialized, so it will probably take me over a day to complete it (but is is ongoing right now).

csteipp removed a parent task: Restricted Task.

ALTER TABLE image ADD img_deleted tinyint unsigned NOT NULL default 0; (or more precisely, a complex list of operations with the same result but without affecting production) has been run on ALL WMF wikis.

I suppose that concludes the first step.

Be careful with Commons: there, image is a very busy 30-million row table, and even simple operations can generate lag.

ALTER TABLE image ADD img_deleted tinyint unsigned NOT NULL default 0; (or more precisely, a complex list of operations with the same result but without affecting production) has been run on ALL WMF wikis.

You missed labswiki, which I just did (as there's only 260 rows there).

In my search to find out why this column had appeared on some wikis but not labswiki, I came across T38497 and T22742 which also suggested adding this column years ago.

csteipp moved this task from Waiting to Ready on the Security-Team board.Aug 18 2015, 11:57 PM
Jdforrester-WMF moved this task from Untriaged to Backlog on the Multimedia board.Sep 4 2015, 6:25 PM
Restricted Application added a project: Commons. · View Herald TranscriptSep 4 2018, 3:23 PM
Bawolff added a subscriber: Bawolff.Sep 4 2018, 3:24 PM

@Tgr Just to clarify, are you still working on this/Does anyone know the status?

Tgr added a comment.Sep 4 2018, 4:21 PM

I think this was just waiting for the patches to be deployed (but that was 3 years ago so they need rebase / re-review now), not sure where / why exactly that got stuck. Are you still interested in getting it out? I'll be mostly on vacation until end of next week, can look at it after that.

chasemp moved this task from Ready to Back Orders on the Security-Team board.
Restricted Application added a project: Platform Engineering. · View Herald TranscriptDec 23 2019, 5:12 PM
Aklapper removed Tgr as the assignee of this task.Jun 19 2020, 4:25 PM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Tgr added a subscriber: sbassett.Jul 5 2020, 7:54 PM

@sbassett is the Security team interested in seeing this through? If not, can it be made public? (It's an infoleak, but a very minor one, and not triggered by user action, so public knowledge of the bug wouldn't make much difference.)
It was a nontrivial amount of work, and the patch clears up a significant amount of tech debt in the image table, so I'd like to see it through one way or another (now that I remember it exists... thanks for the reminder, @Aklapper).

sbassett moved this task from Watching to Incoming on the Security-Team board.EditedJul 6 2020, 2:53 PM

@Tgr - moving this to our clinic meeting today. If the work can go through gerrit sooner than later, that sounds like the easier path to get it deployed, especially for what I'd personally consider a fairly low-risk vulnerability.

@Tgr - Security-Team is fine with this being public, given what we perceive to be a fairly minor information disclosure, especially if the work can proceed sooner than later within gerrit. I'll make this public now.

sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".Jul 6 2020, 3:24 PM
sbassett moved this task from Incoming to Watching on the Security-Team board.

Change 609798 had a related patch set uploaded (by Reedy; owner: CSteipp):
[mediawiki/core@master] Add img_deleted column

https://gerrit.wikimedia.org/r/609798

Reedy added a subscriber: Reedy.EditedJul 6 2020, 3:47 PM

Seems the img_deleted column was only recently removed from WMF wikis... T250055: Remove image.img_deleted column from production

Tgr changed the status of subtask T257222: Add img_deleted column from Open to Stalled.Jul 6 2020, 4:31 PM

@Tgr - Security-Team is fine with this being public, given what we perceive to be a fairly minor information disclosure, especially if the work can proceed sooner than later within gerrit. I'll make this public now.

Thanks! Will try to re-test things in the next week or two.

Majavah added a subscriber: Majavah.Jul 6 2020, 8:14 PM
Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:02 PM