Page MenuHomePhabricator

Majavah (Taavi Väänänen)
User

Projects (25)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Feb 24 2019, 3:58 PM (169 w, 3 d)
Availability
Available
IRC Nick
taavi
LDAP User
Majavah
MediaWiki User
Majavah [ Global Accounts ]

Recent Activity

Today

Majavah edited projects for T303359: Remove items from Meta-Wiki page [[Special:Contact/requestlicense]], added: Wikimedia-Site-requests, WikimediaMessages; removed MediaWiki-extensions-ContactPage.
Wed, May 25, 8:25 PM · WikimediaMessages, Wikimedia-Site-requests, WMF-Legal
Majavah added a comment to T304328: Move Termbox SSR for Beta Wikidata into deployment-prep project.

This new instance is failing to run Puppet:

taavi@deployment-termbox-ssr:~$ sudo run-puppet-agent
Warning: Unable to fetch my node definition, but the agent run will continue:
Warning: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs]
Info: Retrieving pluginfacts
Error: /File[/var/lib/puppet/facts.d]: Failed to generate additional resources using 'eval_generate': SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs]
Error: /File[/var/lib/puppet/facts.d]: Could not evaluate: Could not retrieve file metadata for puppet:///pluginfacts: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs]
Info: Retrieving plugin
Error: /File[/var/lib/puppet/lib]: Failed to generate additional resources using 'eval_generate': SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs]
Error: /File[/var/lib/puppet/lib]: Could not evaluate: Could not retrieve file metadata for puppet:///plugins: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs]
Error: Could not retrieve catalog from remote server: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs]
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Error: Could not send report: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs]

Please fix. All deployment-prep instances must be fully configured via Puppet and not by hand / separate Ansible cookbooks.

Wed, May 25, 5:01 PM · Patch-For-Review, wmde-team-a-tech, User-ItamarWMDE, Cloud-VPS (Debian Stretch Deprecation), Wikidata-Termbox, Beta-Cluster-Infrastructure, Wikidata, wdwb-tech
Majavah merged task T309220: Can't review translated Dev Portal messages in TranslateWiki due to exception of type Wikimedia\Rdbms\DBQueryError into T309219: Fatal exception of type "Wikimedia\Rdbms\DBQueryError" for Oldversion and History pages.
Wed, May 25, 3:42 PM · MediaWiki-extensions-Translate, translatewiki.net
Majavah merged T309220: Can't review translated Dev Portal messages in TranslateWiki due to exception of type Wikimedia\Rdbms\DBQueryError into T309219: Fatal exception of type "Wikimedia\Rdbms\DBQueryError" for Oldversion and History pages.
Wed, May 25, 3:42 PM · translatewiki.net
Majavah created T309214: openstack: cleanup firewall rules.
Wed, May 25, 2:57 PM · cloud-services-team (Kanban), Cloud-VPS
Majavah created T309201: Clean up osm_host from firewall rules and puppet manifests.
Wed, May 25, 1:33 PM · User-dcaro, Technical-Debt, Cloud-VPS, cloud-services-team (Kanban)
Majavah changed E614: ArchCom RFC Meeting: Reading List service (2017-06-14, #wikimedia-office) to repeat until Tue, May 24, 9:00 PM.
Wed, May 25, 8:36 AM

Yesterday

Majavah added a comment to T308989: Refill tool stuck "waiting for an available worker".

If the worker is using a custom k8s deployment, consider configuring liveliness/readiness probes to make kubernetes restart the container when it gets stuck.

Tue, May 24, 2:57 PM · Tool-refill

Mon, May 23

Majavah closed T308955: Maintain kubeusers duplicates context and certificate entries in kubeconfig if the context is not named toolforge as Resolved.

Fixed so maintain-kubeusers won't generate any new broken configs.

Mon, May 23, 2:14 PM · Cloud-Services
Majavah committed rLTMK6dbeb507c6ca: user: fix renewals on paws / toolsbeta (authored by Majavah).
user: fix renewals on paws / toolsbeta
Mon, May 23, 2:04 PM
Majavah added a comment to T308995: wikibugs has stopped showing phab/gerrit comments on IRC as of 2022-05-22Z17:00.

This looks very similar to T304180 and T291129.

2022-05-23 13:13:42,815 - irc3.wikibugs - DEBUG - Register plugin 'irc3.plugins.ctcp.CTCP'
2022-05-23 13:13:42,826 - irc3.wikibugs - DEBUG - Register plugin 'irc3.plugins.autojoins.AutoJoins'
2022-05-23 13:13:42,859 - irc3.wikibugs - DEBUG - Register plugin 'irc3.plugins.sasl.Sasl'
2022-05-23 13:13:42,924 - irc3.wikibugs - DEBUG - Starting wikibugs...
2022-05-23 13:13:43,207 - irc3.wikibugs - DEBUG - Connected
2022-05-23 13:13:43,208 - irc3.wikibugs - DEBUG - CONNECT ping-pong ()
2022-05-23 13:14:24,806 - irc3.wikibugs - CRITICAL - connection lost (139787222388544): None
2022-05-23 13:14:24,809 - irc3.wikibugs - CRITICAL - closing old transport (139787222388544)
2022-05-23 13:14:26,812 - irc3.wikibugs - DEBUG - Starting wikibugs...
2022-05-23 13:14:27,360 - irc3.wikibugs - DEBUG - Connected
2022-05-23 13:14:27,361 - irc3.wikibugs - DEBUG - CONNECT ping-pong ()
2022-05-23 13:15:10,604 - irc3.wikibugs - CRITICAL - connection lost (139787222129536): None
2022-05-23 13:15:10,606 - irc3.wikibugs - CRITICAL - closing old transport (139787222129536)
2022-05-23 13:15:12,608 - irc3.wikibugs - DEBUG - Starting wikibugs...
2022-05-23 13:15:12,884 - irc3.wikibugs - DEBUG - Connected
2022-05-23 13:15:12,885 - irc3.wikibugs - DEBUG - CONNECT ping-pong ()
2022-05-23 13:15:56,808 - irc3.wikibugs - CRITICAL - connection lost (139787222129824): None
2022-05-23 13:15:56,810 - irc3.wikibugs - CRITICAL - closing old transport (139787222129824)
2022-05-23 13:15:58,813 - irc3.wikibugs - DEBUG - Starting wikibugs...
2022-05-23 13:15:59,034 - irc3.wikibugs - DEBUG - Connected
2022-05-23 13:15:59,035 - irc3.wikibugs - DEBUG - CONNECT ping-pong ()
2022-05-23 13:16:40,593 - irc3.wikibugs - CRITICAL - connection lost (139787222129920): None
2022-05-23 13:16:40,594 - irc3.wikibugs - CRITICAL - closing old transport (139787222129920)
2022-05-23 13:16:42,597 - irc3.wikibugs - DEBUG - Starting wikibugs...
2022-05-23 13:16:42,869 - irc3.wikibugs - DEBUG - Connected
2022-05-23 13:16:42,870 - irc3.wikibugs - DEBUG - CONNECT ping-pong ()
Mon, May 23, 1:19 PM · SRE, Wikibugs
Majavah created T309014: sentinel and puppet overwriting toolforge redis config.
Mon, May 23, 10:38 AM · Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-User, User-dcaro, Toolforge, cloud-services-team (Kanban)
Majavah closed T308895: GlobalRename not renaming some accounts as Resolved.
Mon, May 23, 8:19 AM · MW-1.39-notes (1.39.0-wmf.12; 2022-05-16), Stewards-and-global-tools, MediaWiki-extensions-CentralAuth, GlobalRename

Sun, May 22

Majavah committed rLTMK14c586eaa831: user: handle empty kubeconfig files (authored by Majavah).
user: handle empty kubeconfig files
Sun, May 22, 5:19 PM
Majavah closed T308982: tools-sgeexec-0940 down as Resolved.
Sun, May 22, 4:49 PM · Toolforge
Majavah created T308982: tools-sgeexec-0940 down.
Sun, May 22, 4:39 PM · Toolforge
Majavah created P28268 (An Untitled Masterwork).
Sun, May 22, 11:11 AM

Sat, May 21

Majavah added projects to T308954: Auto-generated info page links needs updating: incubator.wikimedia.org, MediaWiki-extensions-WikimediaIncubator.
Sat, May 21, 9:34 PM · MediaWiki-extensions-WikimediaIncubator, incubator.wikimedia.org
Majavah created T308941: Klaxon redirects to http://klaxon.wikimedia.org (not https).
Sat, May 21, 7:20 PM · Sustainability (Incident Followup), Patch-For-Review, SRE
Majavah added a comment to T308895: GlobalRename not renaming some accounts.

Suprisingly this doesn't seem to have caused any unattached local accounts, since the renameuser_status protections in LocalRenameJob prevented the actual renames from happening on the wiki where the rename was requested from.

Sat, May 21, 5:36 PM · MW-1.39-notes (1.39.0-wmf.12; 2022-05-16), Stewards-and-global-tools, MediaWiki-extensions-CentralAuth, GlobalRename
Majavah triaged T308927: quibble-vendor-mysql-php72-selenium-docker: "cannot create directory ‘log’: Permission denied" as High priority.
Sat, May 21, 3:40 PM · Release-Engineering-Team, ci-test-error (WMF-deployed Build Failure), Continuous-Integration-Infrastructure, Continuous-Integration-Config
Majavah placed T308914: Display "Page Not Found" for incorrect request up for grabs.
Sat, May 21, 2:54 PM · VideoCutTool
Majavah triaged T308895: GlobalRename not renaming some accounts as Unbreak Now! priority.

That seems indeed the cause: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CentralAuth/+/refs/heads/master/includes/Special/SpecialGlobalRenameQueue.php#637

Sat, May 21, 1:21 PM · MW-1.39-notes (1.39.0-wmf.12; 2022-05-16), Stewards-and-global-tools, MediaWiki-extensions-CentralAuth, GlobalRename
Majavah added a comment to T308895: GlobalRename not renaming some accounts.

@Zabe I wonder if this can be caused by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/774972? That could explain the Username@somewiki suffix that we're seeing in the rename logs?

Sat, May 21, 1:15 PM · MW-1.39-notes (1.39.0-wmf.12; 2022-05-16), Stewards-and-global-tools, MediaWiki-extensions-CentralAuth, GlobalRename

Wed, May 18

Majavah added a comment to T308381: toolforge: Scrape Kubernetes controller-manager and apiserver metrics into Prometheus.

Useful reading:

Wed, May 18, 6:04 PM · User-dcaro, Sustainability (Incident Followup), cloud-services-team (Kanban), Toolforge
Majavah added a comment to T308381: toolforge: Scrape Kubernetes controller-manager and apiserver metrics into Prometheus.

Poking around to learn how this is handled in the production k8s clusters might be helpful? There are some teaser docs at https://wikitech.wikimedia.org/wiki/Kubernetes/Metrics. Those docs also point to https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config

Wed, May 18, 5:39 PM · User-dcaro, Sustainability (Incident Followup), cloud-services-team (Kanban), Toolforge
Majavah added a comment to T308682: Cleanup two LDAP users with invalid `cn` attributes.

I'm happy to fix it myself if it helps, but thought it might be best to simply create a ticket and tag it with LDAP and SRE to begin with.

Wed, May 18, 3:46 PM · SRE, LDAP
Majavah added a watcher for LDAP: Majavah.
Wed, May 18, 3:44 PM
Majavah added a comment to T308682: Cleanup two LDAP users with invalid `cn` attributes.

Those are not invalid values, those are just people whose usernames contain non-ASCII characters. Our existing stack fully supports them, and I'd argue that any software that does not like non-ascii values in usernames is bugged and should be fixed.

Wed, May 18, 3:43 PM · SRE, LDAP
Majavah added a comment to T305847: Migrate SRE paging alerts off Icinga and to Alertmanager.

How was the above list generated? It's missing WMCS paging alerts, AIUI everything sent to the wmcs-team contact group is paging even if it has page => false set.

Wed, May 18, 1:37 PM · Patch-For-Review, User-fgiunchedi, SRE Observability (FY2021/2022-Q4)
Majavah added a comment to P27926 Error on first puppet run (went away on second run).

Anyhow our work in T308601: Puppet fails on new cloud-vps VMs (with new base images) due to wanting /usr/local/lib/nagios/plugins should fix this issue entirely so probably not worth it to figure out what caused this

Wed, May 18, 12:19 PM
Majavah added a comment to P27926 Error on first puppet run (went away on second run).

Was this on a cloud vps vm or a production host?

Wed, May 18, 12:16 PM
Majavah committed rLPRI22507156dd3f: Add dummy authdns keys to fix PCC (authored by Majavah).
Add dummy authdns keys to fix PCC
Wed, May 18, 12:08 PM
Majavah closed T308555: Provide access to Thanos as Invalid.

Toolforge or Cloud VPS doesn't currently offer a managed Prometheus-style monitoring or alerting service. There have been talks of building one for a while, but so far other projects have consumed our very limited engineering resources.

Wed, May 18, 7:36 AM · Toolforge
Majavah added projects to T308601: Puppet fails on new cloud-vps VMs (with new base images) due to wanting /usr/local/lib/nagios/plugins: Cloud-VPS, Puppet, Observability-Alerting.
Wed, May 18, 7:08 AM · Cloud-Services-Origin-Team, Cloud-Services-Worktype-Maintenance, User-dcaro, Infrastructure-Foundations, Observability-Alerting, Puppet, Cloud-VPS, Patch-For-Review, cloud-services-team (Kanban)

Tue, May 17

Dzahn awarded T308486: openstack-browser stopped showing puppet classes in use a Like token.
Tue, May 17, 3:06 PM · Tool-openstack-browser
Majavah closed T301993: [toolsdb] Enable gtid to help replication recovery, a subtask of T301949: ToolsDB upgrade => Bullseye, MariaDB 10.4, as Resolved.
Tue, May 17, 2:16 PM · Data-Persistence (Consultation), Cloud-VPS (Debian Stretch Deprecation), cloud-services-team (Kanban), Toolforge, Data-Services
Majavah closed T301993: [toolsdb] Enable gtid to help replication recovery as Resolved.

Looks good:

MariaDB [(none)]> show slave status\G
[...]
                  Using_Gtid: No
                  Gtid_IO_Pos:
[...]
Tue, May 17, 2:16 PM · Patch-For-Review, Data-Services, Cloud-Services-Worktype-Maintenance, Cloud-Services-Origin-Team, cloud-services-team (Kanban), User-dcaro
Majavah closed T301993: [toolsdb] Enable gtid to help replication recovery, a subtask of T301951: toolsdb: full disk on clouddb1001 broke clouddb1002 (secondary) replication, as Resolved.
Tue, May 17, 2:16 PM · Cloud-Services-Origin-User, Cloud-Services-Worktype-Unplanned, User-dcaro, cloud-services-team (Kanban), Toolforge, Data-Services
Majavah closed T308486: openstack-browser stopped showing puppet classes in use as Resolved.
Tue, May 17, 5:29 AM · Tool-openstack-browser
Majavah closed T308486: openstack-browser stopped showing puppet classes in use, a subtask of T274666: Add keystone auth middleware to the puppet enc api, as Resolved.
Tue, May 17, 5:29 AM · Patch-For-Review, User-Majavah, cloud-services-team (Kanban), Cloud-VPS
Majavah committed R2073:81343c373338: puppetclasses: Fix keystone authentication (authored by Majavah).
puppetclasses: Fix keystone authentication
Tue, May 17, 5:26 AM

Mon, May 16

Majavah edited projects for T308486: openstack-browser stopped showing puppet classes in use, added: Tool-openstack-browser; removed Toolforge.
Mon, May 16, 7:46 PM · Tool-openstack-browser
Majavah closed T306494: Some of parameters in MediaWiki:Blockedtext are not parsed on Special:CreateAccount as Resolved.

The fix will be included in this week's train.

Mon, May 16, 6:27 PM · MW-1.39-notes (1.39.0-wmf.13; 2022-05-23), I18n, MediaWiki-Authentication-and-authorization
Majavah added a comment to T301993: [toolsdb] Enable gtid to help replication recovery.

This is clouddb1002 (secondary) after setting gtid_domain_id on both servers:

MariaDB [(none)]> SELECT @@GLOBAL.gtid_slave_pos;
+------------------------------------------------------+
| @@GLOBAL.gtid_slave_pos                              |
+------------------------------------------------------+
| 0-2886731673-33519859088,2886731673-2886731673-18688 |
+------------------------------------------------------+
1 row in set (0.00 sec)

Is it intentional that there are two entries and the first one starts with 0-?

Mon, May 16, 3:09 PM · Patch-For-Review, Data-Services, Cloud-Services-Worktype-Maintenance, Cloud-Services-Origin-Team, cloud-services-team (Kanban), User-dcaro
Majavah updated the task description for T308013: Assign SPDX headers to puppet.git.
Mon, May 16, 2:30 PM · Patch-For-Review, Infrastructure-Foundations, SRE
Majavah placed T305008: Forcibly creating a local account causes autoblocks for the user to affect the creating administrator's IP address up for grabs.
Mon, May 16, 5:29 AM · CheckUser, MediaWiki-extensions-CentralAuth

Sun, May 15

Majavah set Due Date to Thu, Jun 2, 9:00 PM on T308402: toolforge: Refresh certs that are not controlled by kubeadm (mid 2022 edition).
Sun, May 15, 12:31 PM · cloud-services-team (Kanban), Toolforge
Majavah triaged T308402: toolforge: Refresh certs that are not controlled by kubeadm (mid 2022 edition) as High priority.
Sun, May 15, 12:31 PM · cloud-services-team (Kanban), Toolforge
Majavah added a member for Stewards-and-global-tools: Majavah.
Sun, May 15, 11:10 AM

Sat, May 14

Majavah added a comment to T308388: User unable to merge accounts.

I don't think this needs to block the account renaming, manual debugging on production confirms that CentralAuthUser::listAttached() includes those accounts so they shouldn't cause issues during the rename. (This is also why using Special:MergeAccount did nothing.)

@Majavah to confirm, I am okay to attempt to process this rename?

Sat, May 14, 6:50 PM · Wikimedia-maintenance-script-run, MediaWiki-extensions-CentralAuth
Majavah removed a project from T308388: User unable to merge accounts: MediaWiki-User-management.

This seems to be a display bug with Special:CentralAuth: we have some 'corrupted' localuser rows for this specific account that look otherwise good but have lu_attached_method set as NULL. SpecialCentralAuth uses the centralauth-admin-unattached message as a placeholder for missing data.

Sat, May 14, 6:08 PM · Wikimedia-maintenance-script-run, MediaWiki-extensions-CentralAuth
Majavah closed T308102: Delete Cloud VPS projects ores and ores-staging as Resolved.
Sat, May 14, 11:33 AM · Cloud-VPS (Project-requests), cloud-services-team (Kanban)
Majavah claimed T308102: Delete Cloud VPS projects ores and ores-staging.
Sat, May 14, 11:28 AM · Cloud-VPS (Project-requests), cloud-services-team (Kanban)
Majavah added a subtask for T308189: Toolforge jobs stopped getting scheduled around the same time as the Toolforge k8s cluster upgrade: T308381: toolforge: Scrape Kubernetes controller-manager and apiserver metrics into Prometheus.
Sat, May 14, 10:51 AM · cloud-services-team (Kanban), Toolforge
Majavah added a parent task for T308381: toolforge: Scrape Kubernetes controller-manager and apiserver metrics into Prometheus: T308189: Toolforge jobs stopped getting scheduled around the same time as the Toolforge k8s cluster upgrade.
Sat, May 14, 10:51 AM · User-dcaro, Sustainability (Incident Followup), cloud-services-team (Kanban), Toolforge
Majavah created T308381: toolforge: Scrape Kubernetes controller-manager and apiserver metrics into Prometheus.
Sat, May 14, 10:51 AM · User-dcaro, Sustainability (Incident Followup), cloud-services-team (Kanban), Toolforge
Majavah added a comment to T292945: Automate kubeadm config change deployment.

A simple automation for step 3 would be to simply to replace the 3 control nodes with new ones as the cluster join process creates the manifest files from what's stored in the config maps.

Sat, May 14, 10:41 AM · cloud-services-team (Kanban), Toolforge
Majavah claimed T308300: toolforge-jobs: scheduled jobs stopped being scheduled.

Looks like the Kubernetes cronjob scheduler may be getting overloaded at midnight given how many tools are running jobs at that point. I've increased the scheduling tolerance of your jobs to make Kubernetes start your jobs even if that means it'll be a little off the scheduled date. However, a better solution would be to, if possible, run hourly/daily jobs at a random time (say, 18:37 daily) instead of exactly midnight or top of the hour.

Sat, May 14, 10:34 AM · User-dcaro, Toolforge

Fri, May 13

Majavah added a comment to T308283: Beta Cluster Tech Decision Forum.

You are reading that right.

Fri, May 13, 7:30 PM · Release-Engineering-Team (Radar), tech-decision-forum
Majavah added a comment to T308283: Beta Cluster Tech Decision Forum.

Am I reading this correctly that the proposed Beta replacement would be directly using and updating production data and also share the production bottlenecks (such as databases)?

Fri, May 13, 5:18 PM · Release-Engineering-Team (Radar), tech-decision-forum
Majavah renamed T307943: Update Kubernetes clusters to v1.23 from Update Kubernets clusters to v1.23 to Update Kubernetes clusters to v1.23.
Fri, May 13, 1:24 PM · Kubernetes, Prod-Kubernetes, serviceops
Majavah closed T305317: Request membership in extension-HidePrefix group for Zoranzoki21 as Resolved.

Done.

Fri, May 13, 1:18 PM · Gerrit-Privilege-Requests
Legoktm awarded T308203: Create a Toolschecker check for K8s cronjobs a The World Burns token.
Fri, May 13, 4:44 AM · Sustainability (Incident Followup), cloud-services-team (Kanban), Toolforge

Thu, May 12

Majavah updated the task description for T308203: Create a Toolschecker check for K8s cronjobs.
Thu, May 12, 5:02 PM · Sustainability (Incident Followup), cloud-services-team (Kanban), Toolforge
Majavah closed T308263: tools.wmcz: cannot use crontab as Resolved.

Looks like this was caused by an unrelated Puppet change. Fixed by updating the hiera data in Horizon.

Thu, May 12, 4:56 PM · cloud-services-team (Kanban), Cloud-Services-Origin-User, Toolforge
Majavah claimed T308263: tools.wmcz: cannot use crontab.
Thu, May 12, 4:53 PM · cloud-services-team (Kanban), Cloud-Services-Origin-User, Toolforge
Majavah closed T308204: toolforge-jobs should set startingDeadlineSeconds by default, a subtask of T308189: Toolforge jobs stopped getting scheduled around the same time as the Toolforge k8s cluster upgrade, as Resolved.
Thu, May 12, 1:27 PM · cloud-services-team (Kanban), Toolforge
Majavah closed T308204: toolforge-jobs should set startingDeadlineSeconds by default as Resolved.
Thu, May 12, 1:27 PM · Sustainability (Incident Followup), cloud-services-team (Kanban), Toolforge
Majavah closed T308189: Toolforge jobs stopped getting scheduled around the same time as the Toolforge k8s cluster upgrade as Resolved.

Everything seems to be working again properly. I've filed some actionables and marked those as subtasks, so closing this task.

Thu, May 12, 1:20 PM · cloud-services-team (Kanban), Toolforge
Majavah closed T308189: Toolforge jobs stopped getting scheduled around the same time as the Toolforge k8s cluster upgrade, a subtask of T282942: Upgrade Toolforge Kubernetes to latest 1.21, as Resolved.
Thu, May 12, 1:20 PM · cloud-services-team (Kanban), Toolforge
Majavah closed T308205: Re-enable CronJobControllerV2, a subtask of T308189: Toolforge jobs stopped getting scheduled around the same time as the Toolforge k8s cluster upgrade, as Resolved.
Thu, May 12, 12:40 PM · cloud-services-team (Kanban), Toolforge
Majavah closed T308205: Re-enable CronJobControllerV2 as Resolved.

Seems to have worked fine. Tentatively closing.

Thu, May 12, 12:40 PM · Sustainability (Incident Followup), cloud-services-team (Kanban), Toolforge
Majavah added a comment to T308205: Re-enable CronJobControllerV2.

This might be useful: https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/19-Graduate-CronJob-to-Stable

Thu, May 12, 6:25 AM · Sustainability (Incident Followup), cloud-services-team (Kanban), Toolforge
Majavah created T308205: Re-enable CronJobControllerV2.
Thu, May 12, 5:36 AM · Sustainability (Incident Followup), cloud-services-team (Kanban), Toolforge
Majavah triaged T308204: toolforge-jobs should set startingDeadlineSeconds by default as High priority.
Thu, May 12, 5:34 AM · Sustainability (Incident Followup), cloud-services-team (Kanban), Toolforge
Majavah raised the priority of T308203: Create a Toolschecker check for K8s cronjobs from High to Needs Triage.
Thu, May 12, 5:28 AM · Sustainability (Incident Followup), cloud-services-team (Kanban), Toolforge
Majavah triaged T308203: Create a Toolschecker check for K8s cronjobs as High priority.
Thu, May 12, 5:27 AM · Sustainability (Incident Followup), cloud-services-team (Kanban), Toolforge
Majavah added a subtask for T282942: Upgrade Toolforge Kubernetes to latest 1.21: T308189: Toolforge jobs stopped getting scheduled around the same time as the Toolforge k8s cluster upgrade.
Thu, May 12, 5:18 AM · cloud-services-team (Kanban), Toolforge
Majavah added a parent task for T308189: Toolforge jobs stopped getting scheduled around the same time as the Toolforge k8s cluster upgrade: T282942: Upgrade Toolforge Kubernetes to latest 1.21.
Thu, May 12, 5:18 AM · cloud-services-team (Kanban), Toolforge

Wed, May 11

Majavah updated the task description for T295190: Upgrade all third-party Toolforge Kubernetes components to versions supporting Kubernetes 1.22.
Wed, May 11, 7:03 PM · cloud-services-team (Kanban), Toolforge
Majavah added a subtask for T286856: Upgrade Toolforge Kubernetes to latest 1.22: T308172: Upgrade PAWS to Kubernetes 1.21.
Wed, May 11, 6:34 PM · cloud-services-team (Kanban), Toolforge
Majavah added a parent task for T308172: Upgrade PAWS to Kubernetes 1.21: T286856: Upgrade Toolforge Kubernetes to latest 1.22.
Wed, May 11, 6:34 PM · cloud-services-team (Kanban), PAWS
Majavah added a parent task for T282942: Upgrade Toolforge Kubernetes to latest 1.21: T308172: Upgrade PAWS to Kubernetes 1.21.
Wed, May 11, 6:34 PM · cloud-services-team (Kanban), Toolforge
Majavah added a subtask for T308172: Upgrade PAWS to Kubernetes 1.21: T282942: Upgrade Toolforge Kubernetes to latest 1.21.
Wed, May 11, 6:34 PM · cloud-services-team (Kanban), PAWS
Majavah created T308172: Upgrade PAWS to Kubernetes 1.21.
Wed, May 11, 6:34 PM · cloud-services-team (Kanban), PAWS
Majavah closed T282942: Upgrade Toolforge Kubernetes to latest 1.21 as Resolved.
Wed, May 11, 6:33 PM · cloud-services-team (Kanban), Toolforge
Majavah closed T282942: Upgrade Toolforge Kubernetes to latest 1.21, a subtask of T286856: Upgrade Toolforge Kubernetes to latest 1.22, as Resolved.
Wed, May 11, 6:33 PM · cloud-services-team (Kanban), Toolforge
Majavah added a comment to T308102: Delete Cloud VPS projects ores and ores-staging.

Hey! I still see a VM in the ores project, is it ok to delete that too?

taavi@cloudcontrol1004 ~ $ os server list --project ores
+--------------------------------------+-------------+--------+----------------------------------------+--------------------------------------------+-----------------------+
| ID                                   | Name        | Status | Networks                               | Image                                      | Flavor                |
+--------------------------------------+-------------+--------+----------------------------------------+--------------------------------------------+-----------------------+
| c0252f8f-6953-4d08-9ffc-57f8dcf7ba18 | calbon-test | ACTIVE | lan-flat-cloudinstances2b=172.16.0.200 | debian-10.0-buster (deprecated 2020-10-16) | g2.cores1.ram2.disk20 |
+--------------------------------------+-------------+--------+----------------------------------------+--------------------------------------------+-----------------------+
Wed, May 11, 7:15 AM · Cloud-VPS (Project-requests), cloud-services-team (Kanban)
Majavah renamed T308102: Delete Cloud VPS projects ores and ores-staging from Delete Horizon projects ores and ores-staging to Delete Cloud VPS projects ores and ores-staging.
Wed, May 11, 7:12 AM · Cloud-VPS (Project-requests), cloud-services-team (Kanban)

Tue, May 10

Majavah added a comment to T290494: Revisit Toolforge automated package updates and version pinnings.

Oh, the kernel pinnings don't work at all. This is an older host that does not use the cloud image

Tue, May 10, 3:31 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
Majavah added a comment to T290494: Revisit Toolforge automated package updates and version pinnings.

Oh, the kernel pinnings don't work at all. This is an older host that does not

taavi@tools-k8s-worker-42:~ $ uname -a
Linux tools-k8s-worker-42 4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux
taavi@tools-k8s-control-2:~ $ kubectl sudo get node tools-k8s-worker-42 -o wide
NAME                  STATUS                     ROLES    AGE     VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                       KERNEL-VERSION    CONTAINER-RUNTIME
tools-k8s-worker-42   Ready,SchedulingDisabled   <none>   2y72d   v1.20.11   172.16.1.74   <none>        Debian GNU/Linux 10 (buster)   4.19.0-14-amd64   docker://20.10.8
Tue, May 10, 3:25 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
Majavah added a comment to T290494: Revisit Toolforge automated package updates and version pinnings.

Just enabled unattended-upgrades. That still leaves the apt pinnings. Also note that the kernel pinning does not work for new hosts (all bullseye hosts and some buster ones) use the cloud kernel variants, which our pinnings don't seem to apply to.

Tue, May 10, 2:18 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge

Mon, May 9

Majavah added a comment to T127607: Fix canonical namespaces for rowiki.
taavi@mwmaint1002 ~ $ mwscript namespaceDupes.php --wiki rowiki --fix
0 pages to fix, 0 were resolvable.
Mon, May 9, 1:32 PM · Wikimedia-Site-requests

Fri, May 6

Majavah added a project to T307800: mysql collation utf8mb4_0900_ai_ci does not work correctly: Wikimedia-Rdbms.
Fri, May 6, 6:28 PM · MediaWiki-Installer
Majavah closed T306594: Jobs stuck in delete state on Toolforge as Resolved.

Done. Sorry for the delay!

Fri, May 6, 6:08 PM · Toolforge
Majavah removed a project from T307768: Outreachdashboard application Internal Sever error: Wikimedia-production-error.
Fri, May 6, 10:58 AM · Education-Program-Dashboard
Majavah added a comment to T307648: Audit database usage of GlobalBlocking extension.
  • Its schema is not optimal, the block reason and actor can be normalized.
    • You could normalize the actor name and comment to the actor id in metawiki but that would couple this database to metawiki's database.
    • You could probably instead normalize to the global user id of the actor in central auth and make comment a set of pre-defined values (1 = 'Open proxy', or something like that)
Fri, May 6, 9:07 AM · MW-1.39-notes (1.39.0-wmf.14; 2022-05-30), GlobalBlocking, Sustainability (Incident Followup), DBA

Thu, May 5

Majavah closed T307693: Bad Request error when starting a Kubernetes service as Resolved.

Looks like I managed to sneak in a bug in the last webservice update which broke starting all Java web services. Fixed now, sorry about that!

Thu, May 5, 5:27 PM · Toolforge, Kubernetes
Majavah edited projects for T307693: Bad Request error when starting a Kubernetes service, added: Toolforge; removed Tools.
Thu, May 5, 2:36 PM · Toolforge, Kubernetes