User Details
- User Since
- May 30 2017, 5:25 PM (457 w, 5 d)
- Availability
- Available
- IRC Nick
- herron
- LDAP User
- Herron
- MediaWiki User
- Unknown
Fri, Mar 6
Mon, Mar 2
root@titan2001:~# journalctl -u thanos-compact | grep halt | tail -n 1 | sed -nr 's/^.*\[(.*)\].*$/\1/p' | tr -s ' ' '\n' | awk -F '/' '{print $NF}' | xargs -I % thanos tools bucket --objstore.config-file /etc/thanos-compact/objstore.yaml mark --id=% --marker=no-compact-mark.json --details="compactor halted due to size"
ts=2026-03-02T23:01:17.563717752Z caller=factory.go:54 level=info msg="loading bucket configuration"
ts=2026-03-02T23:01:18.006507177Z caller=block.go:406 level=info msg="block has been marked for no compaction" block=01KJ0NM3F93EV8AVPN8F6WBPD9
ts=2026-03-02T23:01:18.006537409Z caller=tools_bucket.go:1134 level=info msg="marking done" marker=no-compact-mark.json IDs=01KJ0NM3F93EV8AVPN8F6WBPD9
ts=2026-03-02T23:01:18.006565428Z caller=main.go:174 level=info msg=exiting
ts=2026-03-02T23:01:18.03052689Z caller=factory.go:54 level=info msg="loading bucket configuration"
ts=2026-03-02T23:01:18.477043378Z caller=block.go:406 level=info msg="block has been marked for no compaction" block=01KJ0R8JBTZXJ37MB15DFX75W0
ts=2026-03-02T23:01:18.477073746Z caller=tools_bucket.go:1134 level=info msg="marking done" marker=no-compact-mark.json IDs=01KJ0R8JBTZXJ37MB15DFX75W0
ts=2026-03-02T23:01:18.477104135Z caller=main.go:174 level=info msg=exiting
ts=2026-03-02T23:01:18.502219064Z caller=factory.go:54 level=info msg="loading bucket configuration"
ts=2026-03-02T23:01:18.968827169Z caller=block.go:406 level=info msg="block has been marked for no compaction" block=01KJ4YGDFGVC3X10837J9Q2MMW
ts=2026-03-02T23:01:18.968856936Z caller=tools_bucket.go:1134 level=info msg="marking done" marker=no-compact-mark.json IDs=01KJ4YGDFGVC3X10837J9Q2MMW
ts=2026-03-02T23:01:18.968892937Z caller=main.go:174 level=info msg=exiting
root@titan2001:~# systemctl restart thanos-compact.serviceSat, Feb 28
Feb 26 22:44:53 titan2001 thanos-compact[2096946]: ts=2026-02-26T22:44:53.899137199Z caller=compact.go:559 level=error msg="critical error detected; halting" err="compaction: 2 errors: group 300000@2015487672410861213: pre compaction overlap check: overlaps found while gathering blocks. [mint: 1762732800000, maxt: 1762905600000, range: 48h0m0s, blocks: 2]: <ulid: 01KJ8C9BV9TDJRFJ4WXWJ1JYC5, mint: 1762387200000, maxt: 1762905600000, range: 144h0m0s>, <ulid: 01K9VVXTVR1PTF1DXTBQ6FZBTW, mint: 1762732800000, maxt: 1762905600000, range: 48h0m0s>\n[mint: 1763078400000, maxt: 1763424000000, range: 96h0m0s, blocks: 2]: <ulid: 01KJ8G05G55J7SZWNFQ0Y5HGS9, mint: 1763078400000, maxt: 1763596800000, range: 144h0m0s>, <ulid: 01KAQWEKESYTBPRPQDKD2RYQ2X, mint: 1763078400000, maxt: 1763424000000, range: 96h0m0s>\n[mint: 1763596800000, maxt: 1763942400000, range: 96h0m0s, blocks: 2]: <ulid: 01KJ2X3K5EXEAQAWSC01ZHZD3M, mint: 1763596800000, maxt: 1763942400000, range: 96h0m0s>, <ulid: 01KBSQZ0NAAD4ZXWDSH63VED58, mint: 1763596800000, maxt: 1763942400000, range: 96h0m0s>\n[mint: 1766016000001, maxt: 1767225600000, range: 335h59m59s, blocks: 2]: <ulid: 01KJ3W4RSQMGVXHQH4SBWN9HPZ, mint: 1766016000001, maxt: 1767225600000, range: 335h59m59s>, <ulid: 01KDZWJKBBN3ETNQADX5JCSMRB, mint: 1766016000001, maxt: 1767225600000, range: 335h59m59s>\n[mint: 1761868800000, maxt: 1762214400000, range: 96h0m0s, blocks: 2]: <ulid: 01KJ83M1QDFJ0EGFRVW5SH7S78, mint: 1761868800000, maxt: 1762387200000, range: 144h0m0s>, <ulid: 01K9JFKZ5Y70SNWSHCVQXHZHHF, mint: 1761868800000, maxt: 1762214400000, range: 96h0m0s>\n[mint: 1763424000000, maxt: 1763596800000, range: 48h0m0s, blocks: 2]: <ulid: 01KJ8G05G55J7SZWNFQ0Y5HGS9, mint: 1763078400000, maxt: 1763596800000, range: 144h0m0s>, <ulid: 01KAKDHWR3R1XNHR05MV3NRF8C, mint: 1763424000000, maxt: 1763596800000, range: 48h0m0s>\n[mint: 1764288000000, maxt: 1764633600000, range: 96h0m0s, blocks: 2]: <ulid: 01KJ8SCY4GHZAVV4T995RZS2DG, mint: 1764115200000, maxt: 1764806400000, range: 192h0m0s>, <ulid: 01KBT7GEGR283SQX4Z30ZSE99S, mint: 1764288000000, maxt: 1764633600000, range: 96h0m0s>\n[mint: 1764806400000, maxt: 1766016000000, range: 336h0m0s, blocks: 2]: <ulid: 01KDCJ3M6BM082MEQJYVPV2CE4, mint: 1764806400000, maxt: 1766016000000, range: 336h0m0s>, <ulid: 01KJ8X474PBTJ8GNRD8NFBVYPH, mint: 1764806400000, maxt: 1766016000000, range: 336h0m0s>\n[mint: 1760486400000, maxt: 1761004800000, range: 144h0m0s, blocks: 2]: <ulid: 01K8E6Y8R2R6RSRK57XS5256K0, mint: 1760486400000, maxt: 1761004800000, range: 144h0m0s>, <ulid: 01KJ780CRES6SNXJCZ0YD3ESR7, mint: 1760486400000, maxt: 1761177600000, range: 192h0m0s>\n[mint: 1761523200000, maxt: 1761696000000, range: 48h0m0s, blocks: 2]: <ulid: 01KJ7YQ7J4Q19D3QFSK80ZSA63, mint: 1761177600000, maxt: 1761696000000, range: 144h0m0s>, <ulid: 01K8QWZV1HKE3ED6N228X7ZP6V, mint: 1761523200000, maxt: 1761696000000, range: 48h0m0s>\n[mint: 1762214400000, maxt: 1762387200000, range: 48h0m0s, blocks: 2]: <ulid: 01KJ83M1QDFJ0EGFRVW5SH7S78, mint: 1761868800000, maxt: 1762387200000, range: 144h0m0s>, <ulid: 01K9DHKE61Y5AE6V1Z1BCPNSTJ, mint: 1762214400000, maxt: 1762387200000, range: 48h0m0s>\n[mint: 1763942400000, maxt: 1764115200000, range: 48h0m0s, blocks: 2]: <ulid: 01KB01HHAX2710WXDVGK53RW61, mint: 1763942400000, maxt: 1764115200000, range: 48h0m0s>, <ulid: 01KJ1QBCD7PR45AH6RY8QYHPVG, mint: 1763942400000, maxt: 1764115200000, range: 48h0m0s>\n[mint: 1764633600000, maxt: 1764806400000, range: 48h0m0s, blocks: 2]: <ulid: 01KJ8SCY4GHZAVV4T995RZS2DG, mint: 1764115200000, maxt: 1764806400000, range: 192h0m0s>, <ulid: 01KBNW9E3DQYG4Y189Z6KS2PKV, mint: 1764633600000, maxt: 1764806400000, range: 48h0m0s>\n[mint: 1758758400000, maxt: 1759276800000, range: 144h0m0s, blocks: 2]: <ulid: 01KJ23YVV02BAQJNFAP1SWFQP1, mint: 1758758400000, maxt: 1759276800000, range: 144h0m0s>, <ulid: 01KHRNCPS3MBKWBTYM8A8F9JNM, mint: 1758758400000, maxt: 1759449600000, range: 192h0m0s>\n[mint: 1759449600000, maxt: 1759795200000, range: 96h0m0s, blocks: 2]: <ulid: 01K7C3HK0RM901JYJRNRH4VW94, mint: 1759449600000, maxt: 1759795200000, range: 96h0m0s>, <ulid: 01KJ6E26QNK22BNX1HZ0G5QYF2, mint: 1759449600000, maxt: 1759968000000, range: 144h0m0s>\n[mint: 1759968000000, maxt: 1760313600000, range: 96h0m0s, blocks: 2]: <ulid: 01KJ2P5XHHN1VPFC7T0EJEC5HJ, mint: 1759968000000, maxt: 1760313600000, range: 96h0m0s>, <ulid: 01KHV02F6A4WB8NQ4CK09VWV7V, mint: 1759968000000, maxt: 1760486400000, range: 144h0m0s>\n[mint: 1761177600000, maxt: 1761523200000, range: 96h0m0s, blocks: 2]: <ulid: 01KJ7YQ7J4Q19D3QFSK80ZSA63, mint: 1761177600000, maxt: 1761696000000, range: 144h0m0s>, <ulid: 01K9J90PPJA8KXTD6GTRAZKHZF, mint: 1761177600000, maxt: 1761523200000, range: 96h0m0s>\n[mint: 1762387200000, maxt: 1762732800000, range: 96h0m0s, blocks: 2]: <ulid: 01KAQKRTHG7743PA4XAFZ8DYBK, mint: 1762387200000, maxt: 1762732800000, range: 96h0m0s>, <ulid: 01KJ8C9BV9TDJRFJ4WXWJ1JYC5, mint: 1762387200000, maxt: 1762905600000, range: 144h0m0s>\n[mint: 1762905600000, maxt: 1763078400000, range: 48h0m0s, blocks: 2]: <ulid: 01KJ15PF4N8QYCYR1XC4S7BRTX, mint: 1762905600000, maxt: 1763078400000, range: 48h0m0s>, <ulid: 01KA1A4XVD699TPV1MBD9CVD9N, mint: 1762905600000, maxt: 1763078400000, range: 48h0m0s>\n[mint: 1764115200000, maxt: 1764288000000, range: 48h0m0s, blocks: 2]: <ulid: 01KB4ZZ0GRDE9KCFKASYWVAH47, mint: 1764115200000, maxt: 1764288000000, range: 48h0m0s>, <ulid: 01KJ8SCY4GHZAVV4T995RZS2DG, mint: 1764115200000, maxt: 1764806400000, range: 192h0m0s>\n[mint: 1759795200000, maxt: 1759968000000, range: 48h0m0s, blocks: 2]: <ulid: 01KJ6E26QNK22BNX1HZ0G5QYF2, mint: 1759449600000, maxt: 1759968000000, range: 144h0m0s>, <ulid: 01K77KMHYKHT78NMQ9K2Z9PXAJ, mint: 1759795200000, maxt: 1759968000000, range: 48h0m0s>\n[mint: 1761004800000, maxt: 1761177600000, range: 48h0m0s, blocks: 2]: <ulid: 01KJ780CRES6SNXJCZ0YD3ESR7, mint: 1760486400000, maxt: 1761177600000, range: 192h0m0s>, <ulid: 01K89M9V91VBZYN2NX98T2PNDM, mint: 1761004800000, maxt: 1761177600000, range: 48h0m0s>\n[mint: 1761696000000, maxt: 1761868800000, range: 48h0m0s, blocks: 2]: <ulid: 01K8X1RE8GR0ABPA8878KDVWZE, mint: 1761696000000, maxt: 1761868800000, range: 48h0m0s>, <ulid: 01KJ0E56AHXF02N15VNPSCKNP7, mint: 1761696000000, maxt: 1761868800000, range: 48h0m0s>; group 300000@11257394797428657513: compact blocks [/srv/thanos-compact/compact/300000@11257394797428657513/01KJ0A98QFVAERZ2SW7PRK7GDM /srv/thanos-compact/compact/300000@11257394797428657513/01KJ0K2R2ZSSRKXYAQXPW2JH8F /srv/thanos-compact/compact/300000@11257394797428657513/01KJ0Q8GGTTE4WGMM1YDAY5KTX /srv/thanos-compact/compact/300000@11257394797428657513/01KJ4TTBYXCV65H4K0VSDXF6NS]: 2 errors: add series: symbol table size exceeds 4294967295 bytes: 8741100040; symbol table size exceeds 4294967295 bytes: 8741100040"
Fri, Feb 27
We could consider setting bots to use direct messages to reduce the amount of chatter in the channel without losing notification/highlight functionality outright. Off hand:
Thu, Feb 26
Wed, Feb 25
I can grab this, we have recently racked mwlog[12]003 hardware and can tackle the hw refresh and trixie upgrade at the same time
Mon, Feb 23
Thu, Feb 19
Thanks @Jhancock.wm! Ram upgrades on titan200[12] look good!
I'll be available today all day Eastern TZ. If that works, ping me on IRC when ready to start on titan2001? I'll depool and shutdown the host for you, and then fyi after rebooting it takes about 30 minutes for services to fully reload before moving on to the next host.
Wed, Feb 18
Looks like this would be the first kafka cluster on trixie. For Kafka 3.5 we would likely bring 7.5.12-1 (3.5) to trixie, along with an appropriate jdk version. I'm not sure off hand which jdk version is recommended for 3.5
Tue, Feb 17
Excellent! What is a good start time for you today? I can depool titan1001 ahead of that. I should also mention that the titan hosts can take about an hour to reach green state after restarting. We could stagger this as rolling maintenance over a couple hours, or alternatively days, whichever would be easier from the datacenter perspective. Thanks!
Hi @VRiley-WMF sure, by the way were you able to reclaim any more RAM?
Fri, Feb 13
titan2001:~# journalctl -u thanos-compact | grep halt | tail -n 1 | sed -nr 's/^.*\[(.*)\].*$/\1/p' | tr -s ' ' '\n' | awk -F '/' '{print $NF}' | xargs -I % thanos tools bucket --objstore.config-file /etc/thanos-compact/objstore.yaml mark --id=% --marker=no-compact-mark.json --details="compactor halted due to size"
ts=2026-02-13T19:24:09.4810147Z caller=factory.go:54 level=info msg="loading bucket configuration"
ts=2026-02-13T19:24:09.883338265Z caller=block.go:406 level=info msg="block has been marked for no compaction" block=01KFDRK14WJRPDK0ANJT01XVMX
ts=2026-02-13T19:24:09.883373278Z caller=tools_bucket.go:1134 level=info msg="marking done" marker=no-compact-mark.json IDs=01KFDRK14WJRPDK0ANJT01XVMX
ts=2026-02-13T19:24:09.883404009Z caller=main.go:174 level=info msg=exiting
ts=2026-02-13T19:24:09.90964839Z caller=factory.go:54 level=info msg="loading bucket configuration"
ts=2026-02-13T19:24:10.284998705Z caller=block.go:406 level=info msg="block has been marked for no compaction" block=01KGQ9N12TEE5HD4QAMH0NNTSR
ts=2026-02-13T19:24:10.285037134Z caller=tools_bucket.go:1134 level=info msg="marking done" marker=no-compact-mark.json IDs=01KGQ9N12TEE5HD4QAMH0NNTSR
ts=2026-02-13T19:24:10.285071936Z caller=main.go:174 level=info msg=exiting
ts=2026-02-13T19:24:10.323480343Z caller=factory.go:54 level=info msg="loading bucket configuration"
ts=2026-02-13T19:24:10.715128045Z caller=block.go:406 level=info msg="block has been marked for no compaction" block=01KFV16XPTVSDXK24M7X7J27J4
ts=2026-02-13T19:24:10.715167028Z caller=tools_bucket.go:1134 level=info msg="marking done" marker=no-compact-mark.json IDs=01KFV16XPTVSDXK24M7X7J27J4
ts=2026-02-13T19:24:10.715202207Z caller=main.go:174 level=info msg=exiting
ts=2026-02-13T19:24:10.740656721Z caller=factory.go:54 level=info msg="loading bucket configuration"
ts=2026-02-13T19:24:11.20937999Z caller=block.go:406 level=info msg="block has been marked for no compaction" block=01KFV9FDXQN7PGAJZEKF74Q21C
ts=2026-02-13T19:24:11.209414608Z caller=tools_bucket.go:1134 level=info msg="marking done" marker=no-compact-mark.json IDs=01KFV9FDXQN7PGAJZEKF74Q21C
ts=2026-02-13T19:24:11.209452482Z caller=main.go:174 level=info msg=exiting
ts=2026-02-13T19:24:11.234464264Z caller=factory.go:54 level=info msg="loading bucket configuration"
ts=2026-02-13T19:24:11.674423726Z caller=block.go:406 level=info msg="block has been marked for no compaction" block=01KGQMFF377VXBJTMXTDR2D0RG
ts=2026-02-13T19:24:11.674470577Z caller=tools_bucket.go:1134 level=info msg="marking done" marker=no-compact-mark.json IDs=01KGQMFF377VXBJTMXTDR2D0RG
ts=2026-02-13T19:24:11.674507674Z caller=main.go:174 level=info msg=exitingThu, Feb 12
I've depooled frontend services on titan2001 and will plan to free the space used by this thanos-store instance as a stopgap measure to keep the compactor running
Fri, Feb 6
Feb 06 21:29:32 titan2001 thanos-compact[1053544]: ts=2026-02-06T21:29:32.814266612Z caller=compact.go:559 level=error msg="critical error detected; halting" err="compaction: group 0@11257394797428657513: compact blocks [/srv/thanos-compact/compact/0@11257394797428657513/01KFHBRFDPPK0JMZPQFKTT1Y40 /srv/thanos-compact/compact/0@11257394797428657513/01KFHQ5VK2MP972WE4E14EQXRD]: 2 errors: add series: symbol table size exceeds 4294967295 bytes: 4982340497; symbol table size exceeds 4294967295 bytes: 4982340497"
Feb 6 2026
omitted these blocks from compaction for now
Feb 4 2026
! In T414579#11581131, @tappof wrote:
- Templates SLO manifests and allows default values (e.g. default alert state)
- Allows sweeping changes centrally (e.g. introduce new tag, change window, etc.)
These are definitely pros of keeping the definitions in the Puppet repository. With a dedicated repository, we can set up a CI pipeline to highlight sensitive values that differ from the defaults.
Feb 3 2026
For the purposes of sloth onboarding I'd strongly prefer SLO definitions continue to live in puppet, for several reasons:
Feb 2 2026
Sloth package for 0.15.0 has been built via gitlab CI and uploaded to apt:
Closing this as pilot onboarding has finished, wider onboarding will be tracked in parent task!
SLO WG has decided together to proceed with a production roll out of sloth, which will be tracked in the parent task!
Jan 22 2026
Thanks so much for sorting through this @Papaul and @Jclark-ctr! Yes looks good to me, ready to revert to the reuse variant. Thanks again!
Jan 14 2026
Bumped the grafana VMs in codfw and eqiad to 12G (up from 4) and used this as an opportunity to increase VCPUs to 2 as well
Dec 17 2025
Dec 16 2025
Dec 8 2025
Updated the wikifunctions slot pilot SLO to enable low priority "ticket" alerting
Dec 4 2025
onboarded wikifunctions today as well with config:
Dec 2 2025
Dec 1 2025
Made a couple more adjustments to https://grafana.wikimedia.org/d/slot-pilot-slo-detail/sloth-s-l-o-detail to clean up the rolling window portion
Agreed, looks good!
Nov 25 2025
https://gerrit.wikimedia.org/r/1211177 Elukey Patchset 1 11:50 AM I think it could be a good test but I would try to explain why we get the difference outlined in https://w.wiki/GHoH, because IIUC we should really see the drops in the first place. Maybe there is something extra that we are not seeing?