Page MenuHomePhabricator

Get performance team green light for Cloud NAT to wikis change
Open, MediumPublic

Description

Before introducing the change in the parent task (https://wikitech.wikimedia.org/wiki/News/CloudVPS_NAT_wikis), we would like to get a green light from the Performance team.

Placeholder task. More information will be added soon.

Event Timeline

aborrero triaged this task as Medium priority.Feb 3 2021, 10:58 AM
aborrero created this task.
aborrero moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.
From the task description, by @aborrero:

Placeholder task. More information will be added soon.

I'll await the information so as to know what specifically our role is in the NAT change and how we can be of help.

Meanwhile, I can share a few general thoughts from my own individual perspective, most of which I think has already been said in various places, but here goes:

  • The exposure of internal IPs is generally undesirable as these could become part of the public record, which would be confusing as they don't really make sense from the outside (naturally). I can likely also cause suble bugs downstream or within our infrastructure as we don't expect to encounter such IPs there.
  • In the current set up, I believe the best we could do and have done, is to at least prevent these from being recorded in edit metadata, since that part of the public record is most widely consumed and interacted with. We do this through multiple layers and methods, including by hardcoding blocks in wmf-config. I imagine there are likely various (redundant) on-wiki blocks as part of random local wiki blocks and global blocks.
  • It is very much desirable for edits to come in through Cloud VPS, and we know a double-digit percentage of edits does in fact come from there, generally semi-automated. As I understand it, there is unanimous consensus between the communities and among different stakeholders in the Foundation, that no logged-out edits or account registrations are allowed from Cloud VPS. As such, these blocks should remain in place and be updated accordingly to cover the new (external) IP ranges attributed to Cloud VPS.
  • In the current setup, the use of the internal IP attribution being distict for each individual Cloud VPS instance or project, allows some amount of tracing where edits come from exactly. This has two possible benefits today:
    • In the unlikely event all the layers of blocks stop working, we will know which instance attempted to make a logged-out edit. However this doesn't seem useful or relevant to me, since the problem is not that such attempt was made. The problem is that the block stopped working. As far as I'm concerned, we should assume every Toolforge user has a cronjob intentionally trying to edit logged-out, just to mess with us, and I'd be totally happy with that. This sort of mistake can and will happen in bot software from time to time, and it's our duty to block that so that bot operators notice it and can fix their code. I see no value from us knowing or determining where such attempt came from other than to know the boolean signal that "Cloud VPS" isn't fully blocked. That boolean signal will still exist even when all of Cloud VPS shares the same IP attribution.
    • In the event that a bot is misbehaving, and an on-wiki administrator blocks the bot account, there is a checkbox "Block underlying IP address from editing including from any other account". If they tick that box today, that is almost certainly a mistake. As I understand it, it has been a well-understood and well-documented procedure for over a decade (even back in the Toolserver days) that bots running on Toolforge (or elsewhere in Cloud VPS ) must never be blocked against their underlying IP address, because doing so could block other bots running on the same infrastructure. There are ocasional exceptions today where a bot might have its own VM elsewhere in Cloud VPS which in theory can be blocked today including IP and the mistake wouldn't cause noticable issues, perhaps, but I don't see why someone would do this intentionally. As far as I know, this happy accident isn't documented anywhere, and generally admins don't need to care about this detail where the checkbox is useless but safe for 1% of bots. I expect this to be a non-issue as such.
  • I suppose it could provide a very small and theoretical benefit if we at least retained a distinct IP attributed to Toolforge (separate from the rest of Cloud VPS), so that in the rare case we do have to diagnose internal traffic issues, we can easily narrow it down somewhat. So I'm supportive of that idea, but wouldn't consider it hugely important or mandatory if making that distinction is undesirable or infeasible for any reason.

In the event that a bot is misbehaving, and an on-wiki administrator blocks the bot account, there is a checkbox "Block underlying IP address from editing including from any other account". If they tick that box today, that is almost certainly a mistake. As I understand it, it has been a well-understood and well-documented procedure for over a decade (even back in the Toolserver days) that bots running on Toolforge (or elsewhere in Cloud VPS ) must never be blocked against their underlying IP address, because doing so could block other bots running on the same infrastructure. There are ocasional exceptions today where a bot might have its own VM elsewhere in Cloud VPS which in theory can be blocked today including IP and the mistake wouldn't cause noticable issues, perhaps, but I don't see why someone would do this intentionally. As far as I know, this happy accident isn't documented anywhere, and generally admins don't need to care about this detail where the checkbox is useless but safe for 1% of bots. I expect this to be a non-issue as such.

We can add a similar notice to its user page. Similar to what we used to have for the IP of Qatar: https://en.wikipedia.org/w/index.php?title=User:82.148.97.69&oldid=760432863

One idea I had was that on paper, IPv6 should be faster than IPv4 given that routers don't do checksums on the packets. I heard that in reality, it's not the case though. If anyone knows more about this. It would be great to mention. (I personally love to have the whole infra being on IPv6)

Krinkle edited projects, added Performance-Team (Radar); removed Performance-Team.
Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.

Considering this done for now. Feel free to close if satisfied, or move back to our inbox if there's further questions/need from our team on this.