May 14 13:25:03 moscovium RT[3653758]: [3653758] RT::Handle=HASH(0x7f0e51f826e8) couldn't execute the query 'SELECT main.* FROM CustomRoles main JOIN ObjectCustomRoles ObjectCustomRo> DBIx::SearchBuilder::Handle::SimpleQuery(RT::Handle=HASH(0x7f0e51f826e8), "SELECT main.* FROM CustomRoles main JOIN ObjectCustomRoles O> DBIx::SearchBuilder::_DoSearch(RT::CustomRoles=HASH(0x7f0e801f27d0)) called at /usr/share/request-tracker4/lib/RT/SearchBuilder.pm line> RT::SearchBuilder::_DoSearch(RT::CustomRoles=HASH(0x7f0e801f27d0)) called at /usr/share/perl5/DBIx/SearchBuilder.pm line 513
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Today
Yesterday
May 14 16:00:02 aphlict1002 systemd[1]: aphlict_logrotate.service: Succeeded. May 14 16:00:02 aphlict1002 systemd[1]: logrotate.service: Succeeded. May 14 17:00:02 aphlict1002 systemd[1]: aphlict_logrotate.service: Succeeded. May 14 17:00:02 aphlict1002 systemd[1]: logrotate.service: Succeeded. May 14 18:00:02 aphlict1002 systemd[1]: logrotate.service: Main process exited, code=exited, status=3/NOTIMPLEMENTED May 14 18:00:02 aphlict1002 systemd[1]: logrotate.service: Failed with result 'exit-code'. May 14 18:00:02 aphlict1002 systemd[1]: aphlict_logrotate.service: Succeeded. May 14 19:00:02 aphlict1002 systemd[1]: aphlict_logrotate.service: Succeeded. May 14 19:00:02 aphlict1002 systemd[1]: logrotate.service: Succeeded. May 14 20:00:02 aphlict1002 systemd[1]: aphlict_logrotate.service: Succeeded. May 14 20:00:02 aphlict1002 systemd[1]: logrotate.service: Succeeded.
[aphlict1002:~] $ sudo systemctl status logrotate ● logrotate.service - Rotate log files Loaded: loaded (/lib/systemd/system/logrotate.service; static) Active: inactive (dead) since Tue 2024-05-14 20:00:02 UTC; 9min ago TriggeredBy: ● logrotate.timer Docs: man:logrotate(8) man:logrotate.conf(5) Process: 1519778 ExecStart=/usr/sbin/logrotate /etc/logrotate.conf (code=exited, status=0/SUCCESS) Main PID: 1519778 (code=exited, status=0/SUCCESS) CPU: 25ms
Ah, right. Sorry, mixed up 2 different sets of hosts.
@Jhancock.wm cc: @RLazarus I depooled the server and set a downtime of 24 hours.
Lucas is right. I can confirm the test passes with any wikimedia.org subdomain as long as the path stays /wiki/Main_Page and starts failing as expected once that path changes.
scap runs httpbb /srv/deployment/httpbb-tests/appserver/* --hosts=mwdebug.discovery.wmnet --https_port=4444 --retry_on_timeout
Also see T364773
follow-ups are happening on the parent tasks
Thanks for confirming that @Tarrow Good to know we have a workaround.
Mon, May 13
@ecarg Your user is now in the deployment group on the deployment server. Give it about 30 minutes and you should have all the access needed for an actual deployment.
@JMeybohm I noticed I can't manually run puppet agent on this host. It says I don't have the sudo privileges for it. So I think puppet never ran to setup the initial users.
Fri, May 10
In T363415#9778509, @Dzahn wrote:This still needs https://gerrit.wikimedia.org/r/c/operations/puppet/+/1026193 to be merged to be able to call it resolved.
T364656 will be about upgrading / replacing the production deployment servers.
@akosiaris I made T364656 and suggest seeing that either as a parent task or simply merging this in there.
Looks like the timeout is already fully puppetized and in Hiera:
Current unit file of docker.gs service:
Is there a setting that can be changed to allow more docker-gc.service failures within a particular window before alerting?
Thanks for manager approval. I will upload a patch and assigning to the group approver for consideration :)
Things I think we can delete, at a first glance:
Thu, May 9
Wed, May 8
22:54 < jinxer-wm> RESOLVED: SystemdUnitFailed: wmf_auto_restart_envoyproxy.service on contint1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state -
https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
added envoy to contint1003 to fix T364510
Recently https://gerrit.wikimedia.org/r/c/operations/puppet/+/1028796 was merged which adds auto-restart service for envoyproxy.
This is the test server for releng from T358237
@Tarrow You could try if it works with ssh -o RequiredRSASize=1024 just to debug or until we can fix this.
quoting from a serverfault.com question: "If your getting the "Invalid key length" error, the problem isn't your Ciphers (that may be it's own problem, but if you're getting a key, SSH has agreed to a Cipher)"
Here is the gerrit -> github replication config, it's in hieradata/common/profile/gerrit.yaml in the puppet repo.
Tue, May 7
fwiw - for the person who will add the production puppet role to this later: This is only possible since just recently but should be mostly unblocked now: details in T363415 - needs one more patch though where your review would be great.
@Mcastro Please confirm if you approve
@thcipriani please consider for approval (https://wikimedia.namely.com/people/eaebb898-01ba-404e-8cf8-2ed33c4e0d04/show/personal/employee-information/)