Page MenuHomePhabricator

Split Thanos components from thanos-fe hosts into titan hosts
Closed, ResolvedPublic

Description

This task tracks the implementation of splitting Thanos components from its current home (thanos-fe* hosts). In other words, leaving thanos-fe / thanos-be to purely run Swift for storage. The new hosts and role will be named titan

The following steps are @fgiunchedi's proposal on how to proceed:

  • Decide on a name for the hosts/project (This is by far the hardest part. Filippo' suggestion are sth related to MCU's Thanos, for example anything short will do: vision, tony, glove, etc)
  • Procure hardware: T341237 T341236
  • Create new puppet roles to run Thanos only and apply the role to the new hosts (T341999)
  • Pool the new hosts for thanos-query LVS service only, verify queries can be served as expected (T341999)
  • Depool the thanos-fe hosts from thanos-query LVS service
  • Clean up the existing puppet role thanos::frontend T346143

Event Timeline

fgiunchedi mentioned this in Unknown Object (Task).Jul 10 2023, 3:26 PM
fgiunchedi mentioned this in Unknown Object (Task).

@MatthewVernon @Eevans please let me know what you think of the above proposal. I was imagining the final state to be thanos-fe / thanos-be running only Swift, and the new hardware to be running Thanos only.

@MatthewVernon @Eevans please let me know what you think of the above proposal. I was imagining the final state to be thanos-fe / thanos-be running only Swift, and the new hardware to be running Thanos only.

All seems reasonable to me!

P.S. Isn't Titan the home of Thanos? ;)

@MatthewVernon @Eevans please let me know what you think of the above proposal. I was imagining the final state to be thanos-fe / thanos-be running only Swift, and the new hardware to be running Thanos only.

All seems reasonable to me!

P.S. Isn't Titan the home of Thanos? ;)

very good point! I'd be for titan for sure

It might not be possible, but if we could end up with the thanos-fe* nodes running swift::*classes that would be nice. It feels like it ought to be doable once they're not also running thanos?

That notwithstanding, this seems like a good approach, thank you. I have no further opinions to offer on naming things :)

It might not be possible, but if we could end up with the thanos-fe* nodes running swift::*classes that would be nice. It feels like it ought to be doable once they're not also running thanos?

Yes that's definitely doable and desirable too!

That notwithstanding, this seems like a good approach, thank you. I have no further opinions to offer on naming things :)

Thank you, unless I hear differently by beginning of next week I'm going with titan as the hostname and role naming

fgiunchedi renamed this task from Split Thanos components from thanos-fe hosts to Split Thanos components from thanos-fe hosts into titan hosts.Jul 17 2023, 2:26 PM
fgiunchedi updated the task description. (Show Details)
fgiunchedi updated the task description. (Show Details)
lmata triaged this task as Medium priority.Jul 18 2023, 5:08 PM
lmata moved this task from Inbox to Prioritized on the Observability-Metrics board.
lmata moved this task from Inbox to Up next on the SRE Observability (FY2023/2024-Q1) board.

re: the last point, namely cleaning up thanos components off thanos-fe (therefore leaving only swift) I initially thought of going the ensure => absent route, though that seems more trouble than removing the thanos profiles from thanos::frontend role and roll-reimage the thanos-fe hosts. What do you think @MatthewVernon ?

Yeah, if it's easier to just reimage them (especially if that can be done with swift::* profiles), I'm entirely happy for you to do that - just let me know when so I don't start rebooting the others at the same time :-)

Mentioned in SAL (#wikimedia-operations) [2023-09-13T12:17:39Z] <godog> pool only titan hosts for thanos-web and thanos-query services - T341488

fgiunchedi claimed this task.
fgiunchedi updated the task description. (Show Details)

This is done, we have split titan* hosts and thanos-fe*. The former running thanos components and the latter running swift-proxy to serve thanos-swift object storage

Change 969305 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Update role contact for thanos frontend

https://gerrit.wikimedia.org/r/969305

Change 969305 merged by Muehlenhoff:

[operations/puppet@production] Update role contact for thanos frontend

https://gerrit.wikimedia.org/r/969305

Done, thanks for the reminder.