Page MenuHomePhabricator

Rename stat100x machines to have misc element names
Closed, ResolvedPublic

Description

You expect fooX00N to all be nearly-identical in role, function etc. These do not comply with that expectation.
https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Stats_machines demonstrates most of this, but does not mention stat1001 that appears to be a webserver

Event Timeline

elukey subscribed.

The Analytics team is going to refactor roles and functionalities of these machines, you are right that their usage is a bit confusing (https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups is a nice picture showing it :).

stat1001 is only a webserver for various analytics websites, users can't usually connect to it, this is why it is not listed.

+ 1/2 to this idea. I'm all for renaming these boxes, not sure if misc element names is the way to go, but it might be!

+ 1/2 to this idea. I'm all for renaming these boxes, not sure if misc element names is the way to go, but it might be!

Can you (or someone) give us a brief one-liner of what each of them does, just as you did for stat1001? Maybe we'll end up with statwebserver and a couple other similar names, who knows.

  • stat1001 - app/webserver, no analyst/research access.
  • stat1003 - compute node, lots of storage, mostly used by researchers to connect to MySQL.
  • stat1002 - compute node, lots of storage, with private data and Analytics Cluster (Hadoop) access.
  • stat1004 - compute node, limited storage, with Analytics Cluster (Hadoop) access.

stat1002 and stat1004 are very similar in user usage, but stat1004 is intended more for direct work with Hadoop, whereas on stat1002 folks do more local computation, but also have access to Hadoop. stat1004 was mostly created because stat1002 was getting really busy!

  • stat1002 - compute node, lots of storage, with private data and Analytics Cluster (Hadoop) access.

You mention 'private data' specifically, but at least one host listed without this mention (stat1003) contains MySQL credentials for data that is very much private.

Perhaps so, but that data is not stored on stat1003. Those MySQL dbs are theoretically accessible from anywhere in the prod network, if you have the proper MySQL user creds. stat1003 is more of a just a general purpose compute node, and often used as a bastion from which researchers connect to MySQL.

Okay. What sort of private data (beyond credentials to external systems) is stored directly on stat1002?

Most notably, (and historically), sampled webrequest logs in the udp2log
format.

We actually renamed 1001 to be thorium. It is no longer a stats box (in name).