+1
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Yesterday
Tool-gitlab-account-approval is attempting to leverage existing allow lists from Gerrit, Phabricator, and Toolforge to establish trust in GitLab. My involvement with that project is part of why I started thinking about this general issue and its complexities.
The Developer account <-> SUL user mappings that Bitu is now maintaining in LDAP are likely to be of help with this project. Striker has a legacy feature that needs to be updated to also use LDAP for storage (T148048: Store Wikimedia unified account name (SUL) in LDAP directory). A third source of these mappings is right here in Phabricator where a given account can be linked to either a SUL account, a Developer account, or both.
In T364490#9781475, @Jdforrester-WMF wrote:Possibly caused by https://sal.toolforge.org/log/Aq_tWI8BGiVuUzOd6v44 (GitLab rebuild/restart for upstream security release)?
Figuring out how to add a reasonable health check to the continuous job would be good while we also look for lockup causes that we can correct.
Changes made:
- settings → repository → protected branches → main: allowed to merge == developers + maintainers
- manage → members → invite a group
- repos/mediawiki == developer
- repos/sre == developer
Tue, May 7
Mon, May 6
Sun, May 5
It's alive!
[10:35] < wikibugs> (reopen) bd808: This is a test of the gitlab irc reporter [toolforge-repos/wikibugs2] - https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/33
Sat, May 4
The ingess-nginx configuration change proposed at https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/275 seems to fix the buffering issue:
+1
@Raymond_Ndibe, you might be interested in this one.
Fri, May 3
The issue seems to be related to HTTP/1.1 vs HTTP/2. As previously noted, curl -v --no-buffer -H 'Accept: text/event-stream' 'https://gitlab-webhooks.toolforge.org/sse/' works as expected:
+1
+1
+1
Tue, Apr 30
My very work-in-progress client code is now available at https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/commit/7fcd6c83ae24326e2e3397970b6272fbd8002cdb.
Mon, Apr 29
time="2024-04-29T23:05:40Z" level=debug msg="Trying to download "Welcome.pdf" with size 5679824" func=HandleDownloadSize file="bridgebot-matterbridge@v0.0.0-20240424042617-38c64944bf1d/bridge/helper/helper.go:162" prefix=telegram
[23:13] < wm-bb> <lucaswerkmeister> strange, I can’t find an existing upstream issue for it [23:13] < wm-bb> <lucaswerkmeister> (I would assume it affects matterbridge in general) [23:13] < wm-bb> <lucaswerkmeister> (but what do I know… *something* about tools-static in there is clearly custom ;)) [23:14] < Reedy> THOU SHALT MUST ONLY ONE UPLOAD AT A TIME [23:15] < wm-bb> <lucaswerkmeister> “patience, my young padawan.”
The custom tools-static bit is just this bit of config for MediaDownloadPath and MediaServerDownload.
In T360756#9755711, @Aklapper wrote:@brennen: Do you think this is worth a shot? (Assuming that git config --system --add safe.directory doesn't need to get puppetized or... whatever)
Redis 7, which is available in bookworm, includes the ability to configure client eviction when the server hits a memory pool limit. https://redis.io/docs/latest/develop/reference/clients/#client-eviction
Sun, Apr 28
This was done as part of T363028: Replace custom deployment with build service and job service. I also re-learned about T261988#8700389 during that project. :)
Looks like we forgot about this feature request. :(
Sat, Apr 27
I have submitted the patch that is working in our deploy upstream: https://github.com/42wim/matterbridge/pull/2138. As noted in that commit, voegelas came up with the working change in https://github.com/42wim/matterbridge/issues/1564#issuecomment-1693525232.
Fri, Apr 26
[02:04] wm-bb (~wm-bridge@wikimedia/bot/wm-bridgebot) left IRC (Quit: Bouncer quit) [02:04] wm-bb (~wm-bridge@wikimedia/bot/wm-bridgebot) joined the channel [02:04] ChanServ sets mode +v wm-bb [02:05] < bd808> I just cycled wm-bb connection. Double messages to telegram now or ? [02:06] < wm-bb> <bd808> omg! single message
Thu, Apr 25
In T363028#9746507, @bd808 wrote:I will also make a new task about trying to find a linter to validate the toml files to add to CI.
Such doc updates! Much wow! https://wikitech.wikimedia.org/w/index.php?title=Tool%3ABridgebot&diff=2172216&oldid=2169061
This work actually got done as part of T357729: wikibugs having a hard time staying connected to libera.chat IRC network. https://gitlab.wikimedia.org/toolforge-repos/wikibugs2-znc is now being used by Bridgebot as well.
The main problem in the deploy was that there was a cut-and-paste error in the bridgebot.toml config file. Once that was spotted and fixed things came up as hoped.
In T363296#9743894, @aborrero wrote:in your opinion, should we decline this task and focus on the other angle you mention?
The code and config are ready to try switching everything over. I don't want to do this in my evening however due to the possibility of exciting new failure modes cropping up after running for a little while.
Wed, Apr 24
I have seen one crash on startup in testing but it was not repeatable. It looks like it was triggered by something the irc client saw in scrollback when attaching:
[0005] DEBUG irc: (/layers/heroku_go/go_deps/cache/gitlab.wikimedia.org /toolforge-repos/bridgebot-matterbridge@v0.0.0-20240424042617-38c64944bf1d/bridge/irc/handlers.go:117: github.com/42wim/matterbridge/bridge/irc.(*Birc).handleJoinPart) handle girc.Event{Source:(*girc.Source)(0xc0001f5fb0), Tags:girc.Tags{"time":"2024-04-24T23:19:17.667Z"}, Timestamp:time.Date(2024, time.April, 24, 23, 19, 17, 667000000, time.Local), Command:"JOIN", Params:[]string{"#wikimedia-cloud"}, Sensitive:false, Echo:false} panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xb36c66]
In T363290#9742949, @matmarex wrote:I feel like people mostly use it for notifications, and don't intend to actually review all changes they signed up for.
Adding tiny shell wrappers for the Procfile to call seems to work around the issue.
$ ssh login.toolforge.org $ become bridgebot $ webservice buildservice shell --mount all -m 2G -c 1 $ /layers/heroku_go/go_target/bin/bridgebot -conf /app/etc/testing.toml [0000] INFO router: (/layers/heroku_go/go_deps/cache/gitlab.wikimedia.org/toolforge-repos/bridgebot-matterbridge@v0.0.0-20240424042617-38c64944bf1d/gateway/router.go:66: github.com/42wim/matterbridge/gateway.(*Router).Start) Parsing gateway testing-irc-telegram [0000] INFO router: (/layers/heroku_go/go_deps/cache/gitlab.wikimedia.org/toolforge-repos/bridgebot-matterbridge@v0.0.0-20240424042617-38c64944bf1d/gateway/router.go:75: github.com/42wim/matterbridge/gateway.(*Router).Start) Starting bridge: irc.testing ...
[21:07] < wm-bb> Does it work now? [21:07] < bd808> omg, it did work!
The only thing that didn't seem to work is loading the remotenickformat.tengo script which I assumed was searched for relative to the config file. It looks like it is loaded relative to cwd instead so I will need to update a bit of config.
We are a few versions behind on https://github.com/heroku/buildpacks-go, but I don't see anything in the commits or CHANGELOG that looks directly Procfile related. The latest tagged release also may not be compatible with pack (https://github.com/heroku/buildpacks-go/commit/111bb19806bb838c457ef1778a30487dc50f1cb0).
I love the idea of attempting to create a more equitable resource distribution for networking in Toolforge. I'm not sure yet however how this would actually work as hoped in practice unless there was deep integration with the Kubernetes scheduler.
In T329327#9739317, @aborrero wrote:I confirm this is not the case. The destination address of stream.wikimedia.org, as seen from within Toolforge, is exempt from the egress NAT on Cloud VPS. This means EventStreams gets to see the source IP address of the Toolforge kubernetes worker node.
However, there could be multiple pods within the same Toolforge k8s worker node connecting to the same endpoint, thus consuming the available slots.
This already happened, see T363296: toolforge: explore options to introduce egress network quotas.
Tue, Apr 23
In T308931#7950582, @Ottomata wrote:This happpens when an IP opens up too many simultaneous connections to the service. The client IP is passed on to the eventstreams service from the Varnish frontend http servers, and then a running count of connections per eventstreams application instance is kept. If too many connections are made to that application instance, a 429 is returned. The current limit is connections 2 per application instance, of which there are 8 in each DC. So you should be able to open a total of 16 connections from your IP to a single DC (you will be routed to one of the two automatically based on your location). If you get this error, and are opening fewer than 16 connections, I'd expect a reconnect to route eventually route you to an application instance where you have fewer than 2 connections open.
I suppose, the more connections you have open at once, the more likely this is to happen.
Adding a buildpack like https://github.com/kr/heroku-buildpack-inline to our stack might be an interesting way to implement this feature request. That solution would not readily support the use cases that use an existing buildpack, but it would provide tool maintainers with maximum flexibility with minimal overhead for building their own tools.
Sorry for the mess @taavi. Thank you for noticing the problem and reporting it.
import logging logging.basicConfig(level=logging.DEBUG) logging.captureWarnings(True)
I recently learned that matterbridge uses the https://github.com/spf13/viper library and its AutomaticEnv feature when processing the config file. This allows envvars like MATTERBRIDGE_IRC_LIBERA_BRIDGEBOT_PASSWORD to be used to set secret values at runtime. This is hoped to make conversion to a custom container image a bit simpler by removing the need for a custom interpolation system for the config file.
https://gitlab.wikimedia.org/repos/releng/gitlab-webhooks/-/merge_requests/29 looks to have fixed the unintended regression.
Mon, Apr 22
I can assert that webhook inputs were received by the running tool for both https://gitlab.wikimedia.org/repos/ci-tools/banana-checker/-/merge_requests/10 and https://gitlab.wikimedia.org/repos/ci-tools/libup/-/merge_requests/37