I am revisiting this task 9 years later.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Today
I have deployed it and verified the link now only has the change number. Thank you for the suggestion!
Note that MediaWiki scap deployments exit with status 1 due to parse1002.eqiad.wmnet being down which prevents the docker_pull_k8s to complete successfully. All steps run though, but scap ultimately exit 1 which can be misleading (albeit the deployment itself has worked).
I have deployed the patch for 1.43.0-wmf.1
parse1002.eqiad.wmnet is down / unreachable but is still in the pool of hosts to deploy tool. That has caused the MediaWiki train to fail over night and is causing every MediaWiki deployment to error out due to a timeout when trying reach that host.
Fri, Apr 19
I still have the issue when I post MediaWiki train related messages to both ops-l and wikitech-l. I then end up wondering whether the email reached both list and have to head to the list archive on https://lists.wikimedia.org/postorius/lists/ to confirm.
Marking as resolved as we have agreed to skip this release since we do not use the Jenkins CLI and have it disabled.
Thu, Apr 18
As a side note, when a new patchset has been uploaded, after some minutes a popup shows "A newer patchset has been uploaded RELOAD DISMISS" and the console has:
show-alert: {"text":"A newer patch set has been uploaded"}
Clicking on it does a full reload and the console has:
button-click: {"path":"html.lightTheme>body>gr-alert>div.content-wrapper>gr-button.action>paper-button"} Page: handleChangeRoute ChangeReloaded: 845 ChangeFullyLoaded: 3075
I have tried with Gerrit 3.8.5 and it still does not work. The debug console has:
show-alert: {"text":"CI has completed checks. Reload the change view?"} button-click: {"path":"html.lightTheme>body>gr-alert>div.content-wrapper>gr-button.action>paper-button"}
Less realistic: Per https://we.phorge.it/book/phorge/article/performance/ I realized that "The "xhprof" PHP extension is not available. Install xhprof to enable the XHProf console plugin. You can find instructions in the Installation Guide." but no package per https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1027123
It is now unlikely we will fully migrate out of Gerrit and our obsolete CI (Zuul) currently hardcode master as the default branch to fallback to. Thus for the purpose of MediaWiki development we are sticking to master for now. I am declining the task for now to reflect the reality.
There are a few more changes, I guess they are trivial ones since they did not make it in the release notes.
$ git log --oneline jenkins-2.440.2..jenkins-2.440.3 a9b85dcfe2 (tag: jenkins-2.440.3) [maven-release-plugin] prepare release jenkins-2.440.3 ef340a4492 (tag: jenkins-2.440.3-rc) Merge pull request #9113 from krisstern/feat/stable-2.440/backporting-2.440.3 387f5a600b Backport bundled plugin updates f25c5d061e Bump Mina to 2.12.1 in the CLI (#9089) 1dba772b27 Bump org.springframework.security:spring-security-bom from 5.8.10 to 5.8.11 (#9047) 2ca228aac4 Bump org.springframework:spring-framework-bom from 5.3.32 to 5.3.33 (#9042) 0c9eb0c814 [JENKINS-72799] Apply `SlaveComputer.decorate` also to `openLogFile` (#9009) 57cab7aeef [JENKINS-72796] stable context classloader for Computer.threadPoolForRemoting (#9012) 7aaedac817 Update bundled trilead-api to 2.84.86.vf9c960e9b_458 (#9022) 713e4761d9 [maven-release-plugin] prepare for next development iteration
Wed, Apr 17
Upstream has released new versions of Gerrit on April 14th and made the issue I have filed public: https://issues.gerritcodereview.com/issues/321784734
The web editor breakage got filed as T362545: Inline editing of files no longer works in Gerrit and that is cache related.
It looks like the issue has been resolved in Gerrit 3.8. I can no more reproduce after upgrading from 3.7.x to 3.8.5.
The issue has been fixed by upgrading to Gerrit 3.8 on Monday 15th :)
I have made a mistake on the change, that should have been attached to T228838
In T362518#9719270, @Jdforrester-WMF wrote:This has also broken building CI images. Will have to migrate them to bullseye immediately, I suppose.
Tue, Apr 16
gerrit-prod-1001.devtools.eqiad1.wikimedia.cloud can be deleted: was shutdown because Gerrit used T330312. I used that for Scap development and testing Gerrit upgrade. Nothing needed there
gerrit-bullseye-test.devtools.eqiad1.wikimedia.cloud was created by @Dzahn in April 2023 I guess that was to reimage the Gerrit servers. We can get rid of it.
The instance got shutdown because Gerrit uses LDAP to authenticate users which goes against the WMCS policy. I have used that instance to test Gerrit upgrades and some scap/deployment improvements. There is nothing specific to recover and we can revisit later, probably setting up a dev environment similar to scap3-dev.
In T107254#9710082, @hashar wrote:As for CLOSED, I guess we did that to prevent further comments and/or to indicate the task resolution got verified. Then.. I cant quite remember how we used Bugzilla 15/20 years ago :D
I have done the basic testing and Zuul seems to have been working on contint1003 which is Bullseye based. We can keep the instance around for some weeks in case something is broken, that might be handy.
refs/meta/config holds the Gerrit configuration for the repository, it indeed can not be deleted. The ORES repositories being synced from GitHub to Gerrit using Phabricator is a terrible case which kept causing troubles here and there and we should remove that system (that is 5 years old T213246).
Mon, Apr 15
@Dreamy_Jazz awesome! Everything ends up easier when there is a very recent case since that means we still have the debug logs from Zuul. So here we go.
I have upgraded Gerrit this morning and if I remember well there are some issues with some caches not being invalidated somewhere in the stack. Thus eventually the browsers runs outdated javascript which refers to assets that are no more existing :-\ It definitely happened in some recent previous upgrade.
The image is based on Buster since the Puppet master in production use / used Debian Buster. That is to ensure we use / used the same version of ruby as in production. I guess that image can be recreated targeting Bullseye / Bookworm.
Looks like the main one to swtich was 2d46b067418cdeebdad384d612338598d386fb7b T271649
The upstream patches are https://gerrit-review.googlesource.com/q/Ie38535b2df123a62dfd6a6e4b4ee60a80b0254f3 but only got released starting from 3.8.
The fix is in v3.8.5, v3.9.4 and will be in v3.10.0
Fri, Apr 12
I have deployed the fixed and, thanks to the reproducible case, I can confirm that fixed that specific error log :) Thank you!
Note: we have dump of Bugzilla data at https://dumps.wikimedia.org/other/bugzilla/ , as static html file and a database dump (without emails)
And... the script now takes seconds instead of weeks. Now we have enough extra Watts to shutdown a small nuclear plant somewhere 馃寛
In T359868#9702759, @gerritbot wrote:Change #1017958 merged by jenkins-bot:
[mediawiki/core@REL1_42] Bump PHPVersionCheck & composer expected PHP versions to 8.1.0
Thu, Apr 11
Train is unblocked (thank you @ovasileva!).
Oh that is nice, I guess I should edit Wikivoyage a bit more to be exempt from the bug. Thank you for the triage!
I have verified the fix on the debug server using my admin account on MediaWiki.org. I have used https://m.mediawiki.org/wiki/Special:Watchlist?debug=1 to nuke the resource loader cache.
Sorry I was processing the backlog of errors that happened today and missed it got fixed an hour or so ago! Thank you.
Hi @SCherukuwada you have a script account_vanishing_emailer.php running on mwmaint1002 which has some PHP fault and ends up emitting php warning in production :) Can you please fix it? It might be wise to send the script to some git repository :)
When I hit https://en.m.wikivoyage.org/wiki/Special:Watchlist I get redirected https://en.m.wikivoyage.org/wiki/Special:EditWatchlist and I do not get the top navigation bar with All Pages Talk Other nor the time/delta on the side :/ Or can it be fixed during the day? :)
@Jdlrobson said on the wiki page it might be a fallout of T358904 and thus https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MobileFrontend/+/1015636
After five years, it is never too late. This has hit us again this week while I was deploying the MediaWiki train and I guess it is finally time to address it and log all errors.
Wed, Apr 10
Oups, looks like I forgot to remove the Diffusion mirrors at the time I have deleted the Gerrit repository. Thank you for the cleanup @Aklapper !
I previously filed a placeholder task T360178 for it but I have not investigated it cause the example change I had was from 2013. If it happens on newer change, well I guess something is indeed really broken :(
I have deployed the fix and confirmed it to be working. Thank you for the quick patch!
I am declining the task given it is on radar and 6 minutes Kubernetes deployment seems to be the expected duration (due to the 3% chunks and the CPU usage pressure on the cluster as explained above T360403#9641095).
@daniel I have deployed MediaWiki this morning ~ 8:30am UTC and thus it is already deployed on commons!
That is solved. I am waiting for a review of https://github.com/jenkinsci/collapsing-console-sections-plugin/pull/27 and can then cut a new release and get it deployed to our instance. Marking this stalled until it happens.
Tue, Apr 9
I have been added as a developer by Upstream https://github.com/jenkins-infra/repository-permissions-updater/pull/3874