Page MenuHomePhabricator

Vandalism on Phabricator: Undo changes made (2018-07-01)
Closed, ResolvedPublic

Assigned To
None
Authored By
Jsamwrites
Jul 1 2018, 6:28 AM
Referenced Files
None
Tokens
"Yellow Medal" token, awarded by Aklapper."Barnstar" token, awarded by srishakatux."Yellow Medal" token, awarded by Dinoguy1000."Barnstar" token, awarded by Kaartic."Barnstar" token, awarded by Krenair."Yellow Medal" token, awarded by AfroThundr3007730."Yellow Medal" token, awarded by MarcoAurelio."Barnstar" token, awarded by Vachovec1."Burninate" token, awarded by 1233thehongkonger."Barnstar" token, awarded by matej_suchanek."Cup of Joe" token, awarded by Rachmat04."Barnstar" token, awarded by phuedx."Barnstar" token, awarded by Liuxinyu970226."Barnstar" token, awarded by Dalba."Barnstar" token, awarded by deryckchan."Barnstar" token, awarded by MacFan4000."Barnstar" token, awarded by rafidaslam."Barnstar" token, awarded by ToBeFree."Barnstar" token, awarded by Ciencia_Al_Poder."Barnstar" token, awarded by RichSmith.

Description

A number of phabricator tickets have been vandalised by the user Vvjjkkii (now disabled) during the last 24 hours. There is no option to undo these changes. For example, please take a look at the ticket https://phabricator.wikimedia.org/T195219 and the changes that happened overnight https://phabricator.wikimedia.org/transactions/detail/PHID-XACT-TASK-zohnmpdgjbql4yy/

Is there any way this problem can be resolved?

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Once the reversions are done, will it be possible to get the traces of this attack wiped? TBH it seems to me like this matter would best be handled at DBA level.

The potential for damage probably outweighs the benefits of attempting to do a DB-level revert. If there were regular snapshots, the entire instance could be rolled back to the most recent good version. That may not be viable, depending on how much legitimate user activity was occurring at the same time.

@Scott: What's the use case of 'wiping'? What does 'wiping' mean?

Doing a revert of the database itself, deleting transactions.

Yeah - on enwp we can use RevDel to remove the visible traces of vandalism, but that's obviously not an option here.

@AfroThundr3007730 wrote:
The potential for damage probably outweighs the benefits of attempting to do a DB-level revert. If there were regular snapshots, the entire instance could be rolled back to the most recent good version. That may not be viable, depending on how much legitimate user activity was occurring at the same time.

Surely the answer would be to roll back the tasks that were touched by the vandal, not the entire DB? Including tasks that have been manually reverted, if nothing new has happened to them since being reverted.

Edit: this upstream comment from 2016 makes it pretty clear we can go whistle rather than expect that to happen. Pity. Manual or bot reverts all round then.

Doing a revert of the database itself, deleting transactions.

Ah, thanks. DB reverts are not planned. As written in T198552#4362457, "Reverting is currently already being performed" but afaik not on a DBA level.

Yeah - on enwp we can use RevDel to remove the visible traces of vandalism, but that's obviously not an option here.

@AfroThundr3007730 wrote:
The potential for damage probably outweighs the benefits of attempting to do a DB-level revert. If there were regular snapshots, the entire instance could be rolled back to the most recent good version. That may not be viable, depending on how much legitimate user activity was occurring at the same time.

Surely the answer would be to roll back the tasks that were last touched by the vandal, not the entire DB?

Depending on the database schema, that may be non-trivial.

Every interaction with an object in Phabricator produces a transaction, which is referenced with a global transaction ID (PHID). Those transactions are then applied to the referenced object, similar to a diff. And I believe (still reading through the docs/source) that the objects themselves only store their current state, not a history. This may mean that undoing those bad edits would require generating the inverse of each of those transactions, and applying them. In any case, the bot is basically applying those reverse transactions already.

We could generate a list of transactions created by the vandal, but simply deleting them may have unexpected results - namely, the transactions that occurred after them. It may be akin to just deleting a diff from the middle of the chain, which would invalidate any diffs that occurred after. If we just wanted to roll back the affected tasks, and drop everything after the point of vandalism, that may work better than cherry-picking transactions from the middle of the stack. Once the damage has been completely reversed, there would be little harm in leaving the transaction history in place. Not all cases of vandalism on Wikipedia need a RevDel, and I don't believe it should be necessary in this case either.

Perhaps we should ask the Phabricator devs about the feasibility of doing a db-level revert like this, for future reference. Also, see upstream T8830 regarding the issues resulting from deleting objects.

Aklapper renamed this task from Vandalism on Phabricator: Undo changes made to Vandalism on Phabricator: Undo changes made (2018-07-01).Jul 1 2018, 6:04 PM

We introduced the limit a few hours ago.

If the bot didn't revert the title or anything else, it's because it hit the rate limit.

Sorry if I'm missing something obvious here, but wouldn't be more reasonable to put the rate limit after the bot is done with the reverts?

To take the proxy measure suggested above in T198552#4365794 for estimating how many tickets are still affected:
There are currently about 3210 tickets in https://phabricator.wikimedia.org/tag/gamepress/ , compared to around 3470 at 5:09 pm (after the bot had already been slowed down to avoid hitting the rate limit). That would roughly be 90 tickets cleared up per hour, and 35 hours remaining.

MacFan4000 subscribed.

Seems to be doing the most work

To take the proxy measure suggested above in T198552#4365794 for estimating how many tickets are still affected:
There are currently about 3210 tickets in https://phabricator.wikimedia.org/tag/gamepress/ , compared to around 3470 at 5:09 pm (after the bot had already been slowed down to avoid hitting the rate limit). That would roughly be 90 tickets cleared up per hour, and 35 hours remaining.

Ugh. Is there any way to grant the bot an exception from the throttle limit?

Was this the same vandal as 14 days ago (T197456)? There should definitely be a permanent throttle(s) for how many tasks can be modified by one person in predefined timeframe (f. e. 5 or 10 tasks per minute and/or 100 tasks per hour with exception for admins and certified bots).

Throttle it, add new tools, and I think all is set - Do not forget all those attacks. I think it is just the start only.

Ugh. Is there any way to grant the bot an exception from the throttle limit?

Was this the same vandal as 14 days ago (T197456)? There should definitely be a permanent throttle(s) for how many tasks can be modified by one person in predefined timeframe (f. e. 5 or 10 tasks per minute and/or 100 tasks per hour with exception for admins and certified bots).

See my comment in T198552#4367694 about the rate limiting. And from the actions taken, it would seem they were different actors, and that last one didn't hit nearly as many tickets.

To take the proxy measure suggested above in T198552#4365794 for estimating how many tickets are still affected:
There are currently about 3210 tickets in https://phabricator.wikimedia.org/tag/gamepress/ , compared to around 3470 at 5:09 pm (after the bot had already been slowed down to avoid hitting the rate limit). That would roughly be 90 tickets cleared up per hour, and 35 hours remaining.

Now down to around 2790, i.e. in 11 hours about 680 tickets were repaired or 62 per hour (to be clear, the 90/hour above may have included manual cleanup efforts too). At that rate the ETA would be in about 45 hours from now. However, @MusikAnimal and @mmodell have been doing more weekend work to speed up the bot further. [edit: fixed off-by-one error]

Was this the vandal that changed the names of tasks into gibberish? And unsubscribed AKlapper?

Was this the vandal that changed the names of tasks into gibberish? And unsubscribed AKlapper?

yes

Change 443391 had a related patch set uploaded (by MarcoAurelio; owner: MarcoAurelio):
[labs/tools/wikibugs2@master] Ignore 'CommunityTechBot'

https://gerrit.wikimedia.org/r/443391

Change 443391 merged by jenkins-bot:
[labs/tools/wikibugs2@master] Ignore 'CommunityTechBot'

https://gerrit.wikimedia.org/r/443391

Change 443391 had a related patch set uploaded (by MarcoAurelio; owner: MarcoAurelio):
[labs/tools/wikibugs2@master] Ignore 'CommunityTechBot'

https://gerrit.wikimedia.org/r/443391

Deployed this. If CTB also performs other maintenance tasks that should be announced, we should probably change the implementation (or split the work over two usernames), but this is probably good for now.

It would appear that @Community_Tech_bot is currently tackling these, although I'm not sure it's getting every change. In several cases, it's missed the priority (T197020) or even the title (T197103) when reverting changes. I'm not sure if maybe it's built a queue of changes to revert that aren't necessarily in order and it will get to them eventually, or if it failed on those and just moved on to the next one. Might be something for the operator to look into further, once the bot is done with this run.

Currently running as @CommunityTechBot (a formal phab bot), and as of ~16:00 UTC July 1, it is properly restoring the priority.

It is still regularly hitting the rate limiting, as am I. It would be wonderful to make bots exempt. @20after4 tried to help me with this, to no avail.

If the bot didn't revert the title or anything else, it's because it hit the rate limit.

Is the bot going to revisit the transactions that didn't succeed due to it hitting the rate limit, or are we going to have to correct those cases manually?

Now down to around 2790, i.e. in 11 hours about 680 tickets were repaired or 62 per hour (to be clear, the 90/hour above may have included manual cleanup efforts too). At that rate the ETA would be in about 45 hours from now. However, @MusikAnimal and @mmodell have been doing more weekend work to speed up the bot further. [edit: fixed off-by-one error]

Around 1650 to go now (according to the Gamepress backlog). This might be done before the day is over.

My2c. First, the numbers: still 1500 (circa) tasks that need a rollback, and thanks to @MusikAnimal for such a great job. What mostly surprises (and scares) me is the absence of a throttle before all this whole thing happened. Even a high limit, but something like that needs to exist. Also, the revert functionality from the upstream ticket would be a really nice feature to have, since Phabricator lacks of real measures against vandalism (although IMVHO I can't imagine any vandalism-resilient infrastructure without an euristic system like AbuseFilter). That being said, I agree with MusikAnimal that this task shouldn't be public, at any time.

For security mesures, new wiki accounts should not be able to log in to Phabricator, let's alone edit anything. I think some kind of approval should be necessary to edit on Phabricator.

Hi all,

I can see some tasks are still not fixed, and https://phabricator.wikimedia.org/p/Community_Tech_bot/ is not running. I've used the bulk edit feature and a few of manual edits to fix at least all tasks on my personal workboard, to enable me to work again. Can somebody re-run the bot, please?

For security mesures, new wiki accounts should not be able to log in to Phabricator, let's alone edit anything. I think some kind of approval should be necessary to edit on Phabricator.

All new accounts must be approved by Phabricator administrator nowadays, per mail in wikitech-l.

Hi all,

I can see some tasks are still not fixed, and https://phabricator.wikimedia.org/p/Community_Tech_bot/ is not running. I've used the bulk edit feature and a few of manual edits to fix at least all tasks on my personal workboard, to enable me to work again. Can somebody re-run the bot, please?

It's now running at https://phabricator.wikimedia.org/p/CommunityTechBot/ instead.

Ah, thank you. Maybe the old account should be disabled to prevent confusion?

Just a note, in https://phabricator.wikimedia.org/T145832 it was proposed that things like editing tasks should be restricted to Trusted-Contributors

For security mesures, new wiki accounts should not be able to log in to Phabricator, let's alone edit anything. I think some kind of approval should be necessary to edit on Phabricator.

All new accounts must be approved by Phabricator administrator nowadays, per mail in wikitech-l.

I don't believe this is a good solution if we make it permanent, we will miss relevant feedback with it. Phabricator should have resources to face vandalism just as the Wikimedia projects have.

No need to assign task to non-humans...

For security mesures, new wiki accounts should not be able to log in to Phabricator, let's alone edit anything. I think some kind of approval should be necessary to edit on Phabricator.

That's stuff to discuss in T84 as this task is about reverting the changes. Furthermore it does not make sense because we have third-party MediaWiki users.

We seem to be down to 50+ tasks now!

The bot would appear to have finished! Now all that is left is to fix anything missed by the bot.

The bot has now completed it's run. If you see any outstanding tasks that need to be repaired, please give me the task IDs.

The bot ran for roughly 36 hours, repairing at least 4,000 tasks (probably many more).

There were some issues with the bot that may still affect your tasks:

  • The triage level was not restored, or was put in "Needs triage". This was fixed around 16:00 UTC on July 1.
  • For most of the bot's run, it was subject to a newly imposed rate limiting. If the rate limit was hit in the middle of repairing a task, the bot may not have fixed everything. Many tasks were affected. This issue was fixed around 15:00 UTC on July 1.
  • For some tasks, the vandal removed tags as well adding some. The bot did not properly restore the removed tags until around 12:00 UTC on July 2. The number of tasks affected by this is estimated to be low.
  • Some tasks have "custom fields" that were vandalized, which the bot did not restore. An example is the "due date" on T193593. The number of tasks affected by this should be very low.

If you notice any tasks where the bot didn't fix everything, and you don't want to fix it yourself, just give me the task IDs and I can re-run the bot on those.

Thanks to @Aklapper, @mmodell, and everyone else to helped with this effort.

thanks @MusikAnimal !!!

I can confirm this for fr-tech. I reviewed over 200 tasks and I only found 1 task that had a partial restore: T195408

If anyone wants to search their own tasks I suggest this:

  • got to your phab home page
  • click on "Task Search" lower down on the left had menu
  • add your project name in the "tag" field
  • in the "updated after" field type "3 days ago"

This should show you all tasks that were updated well before the vandalism. You should be able to scan for any remaining vandalized subjects this way.

If you want a lighter weight version I recommend also adding "any closed status" in the status section. This will obviously show closed items only. This is important because closed task don't show up on project boards by default. When working on a backlog, you would eventually come across open vandalized tasks but not closed ones. Important tasks that were incorrectly closed could really mess up your backlogs.

The bot would appear to have finished! Now all that is left is to fix anything missed by the bot.

If you notice any tasks where the bot didn't fix everything, and you don't want to fix it yourself, just give me the task IDs and I can re-run the bot on those.

I don't suppose the bot was logging its actions so we could see which items weren't completely restored? Perhaps we should think about implementing this for future incidents.

Glad the majority of the damage is finally over with though.

Krenair subscribed.

I don't suppose the bot was logging its actions so we could see which items weren't completely restored? Perhaps we should think about implementing this for future incidents.

Glad the majority of the damage is finally over with though.

The history is still in the DB so it'd be possible to dig up exactly what tasks were affected if need be.

I don't suppose the bot was logging its actions so we could see which items weren't completely restored?

It's not really helpful, but a log (if you want to call it that) is at https://phabricator.wikimedia.org/feed/?userPHIDs=PHID-USER-7ztcnkn7p4jjem6mirro. All actions under that account were part of the cleanup effort.

It's not really helpful, but a log (if you want to call it that) is at https://phabricator.wikimedia.org/feed/?userPHIDs=PHID-USER-7ztcnkn7p4jjem6mirro. All actions under that account were part of the cleanup effort.

I kind of meant was the bot itself keeping a log of successes and failures for each action it took. That would've made the final cleanup phase much easier, if we just had it replay the failed actions.

As of now we'd need to touch each of those tasks manually to see what was missed, and we'll still probably overlook a few, since probably nobody is going to look at all 4k of them.

Another idea: How difficult would it be to pull the action history of the vandal and the bot, and do sort of a diff between them to see which actions are still outstanding? That would allow for a targeted run to finish them off.

I'm not that well versed on Phabricator internals to know if this would be a viable solution or not.

A proper failure log would have been the smart thing to do, but unfortunately no log was kept. Without saying too much on this public task, the order of operations that the bot went by should mean it eventually revisited any tasks that it didn't finish before. Using other tactics, I just managed to find an additional 15 or so tasks that were still vandalized, and they're fixed now. So at this point, I'm fairly confident there aren't very many (if any) tasks that are still vandalized. Triage levels may still be wrong for many tasks, however. I have some ideas on how to fix those, and will look into it soon.

The bot has re-ran through most if not all tasks and restored the triage level. In the process it fixed a few other things (missing tags, subscribers, etc.) that the bot missed in previous runs, or users missed when repairing the tasks manually. The only thing outstanding is a handful of tasks that had custom fields like "due date" (some already fixed by Aklapper), and at least one that had something called "blocked" (don't have the task ID, unfortunately).

As far as the bot is concerned, I think we're about as close to having this resolved as we can be.

Cleaning up has taken place and finished.
Thanks to many many people who helped (on a Sunday), by manually reverting or technically.

(Discussion is ongoing about countermeasures in the future; some of them have already been taken. More to come in the Incident Report.)