Page MenuHomePhabricator

Graduate codesearch to production
Open, Stalled, HighPublic

Description

codesearch has become a tool that developers increasingly rely upon. It's probably past time we graduate it into production.

Moving into production will bring other benefits, primarily allowing Gerrit to replicate Git repos instead of codesearch having to poll. Also better monitoring, etc.

The current architecture is roughly documented at https://www.mediawiki.org/wiki/Codesearch/Admin and is already fully puppetized.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
integration/configmaster+8 -0
operations/container/codesearchmaster+95 K -1
operations/container/codesearchmaster+0 -3
operations/container/codesearchmaster+2 -0
operations/container/codesearchmaster+12 -4
operations/container/codesearchmaster+2 -0
integration/configmaster+9 -0
operations/container/codesearchmaster+6 -1
operations/container/codesearchmaster+21 -0
operations/container/codesearchmaster+7 -0
operations/deployment-chartsmaster+3 -0
operations/dnsmaster+5 -0
operations/puppetproduction+15 -0
operations/puppetproduction+1 -0
operations/dnsmaster+5 -0
operations/puppetproduction+4 -0
operations/dnsmaster+1 -0
operations/puppetproduction+53 -5
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Now we have to upgrade the buster machines in cloud anyways, for T367479.

This is a good opportunity because it's basically a good chunk of the work also needed for this ticket.

Change #1043901 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] codesearch: add support for docker-ce on bookworm

https://gerrit.wikimedia.org/r/1043901

Change #1043901 abandoned by Dzahn:

[operations/puppet@production] codesearch: add support for docker-ce on bookworm

Reason:

in favor of https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046724

https://gerrit.wikimedia.org/r/1043901

One thing to be done here:

add a systemd unit for the frontend. See T367479#9904556

One thing to be done here:

add a systemd unit for the frontend. See T367479#9904556

I looked at this a bit. It's actually should be rather easy but a bit of work:
currently CI pipeline is not publishing docker images for the frontend docker build. Only codesearch itself: https://docker-registry.wikimedia.org/wikimedia/labs-codesearch/tags/

Once that's available, we just need to make a systemd service similar to other hound-* ones.

Making these images pushed to docker registry via CI is actually quite complicated because codesearch pushes via blubber (I don't think there is any other way) and you can push only one image per repo. Not to mention that converting Dockerfile of frontend to a blubber file is quite a lot of work :( any ideas are welcome here.

Not to mention that converting Dockerfile of frontend to a blubber file is quite a lot of work :( any ideas are welcome here.

Dropping php-mustache and libmustache for a pure PHP solution looks like it would greatly simplify your container build requirements.

Alternately, build a base image via https://gerrit.wikimedia.org/r/plugins/gitiles/operations/docker-images/production-images that includes those dependencies so that you don't have to fight against Blubber's build time abstractions. Third option being I guess packaging libmustache and php-mustache for our local apt repo.

Just dumping some data here for a comparison. The current codesearch VM, codesearch9 is a g4.cores4.ram8.disk20 running Debian bookworm.

We have been evaluating software for a refreshed codesearch and https://www.sourcebot.dev/ seems like a viable candidate.

feature request to track with sourcebot upstream:

https://github.com/sourcebot-dev/sourcebot/issues/81 - search contexts / repo groups

https://github.com/sourcebot-dev/sourcebot/issues/165 - configurable repo names / "leading g"

https://github.com/sourcebot-dev/sourcebot/issues/166 - encode filtering in query params

LSobanski raised the priority of this task from Low to High.Jan 23 2025, 12:57 PM

Change #1126170 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] deployment_server/k8s: set kubeconfig files for codesearch

https://gerrit.wikimedia.org/r/1126170

Change #1126175 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/deployment-charts@master] create a namespace for codesearch on k8s-aux cluster

https://gerrit.wikimedia.org/r/1126175

Change #1126176 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] create codesearch.wikimedia.org, point to standard DYNA

https://gerrit.wikimedia.org/r/1126176

Change #1126177 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] add ingress service alias for codesearch on k8s-aux

https://gerrit.wikimedia.org/r/1126177

Change #1126182 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] create k8s-ingress-aux -ro and -rw discovery records, metafo/geodns

https://gerrit.wikimedia.org/r/1126182

Change #1126176 merged by Dzahn:

[operations/dns@master] create codesearch.wikimedia.org, point to standard DYNA

https://gerrit.wikimedia.org/r/1126176

Change #1128988 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] conftool-data: add codesearch service to discovery objects

https://gerrit.wikimedia.org/r/1128988

Change #1128989 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] servicecatalog: add codesearch in state service_setup

https://gerrit.wikimedia.org/r/1128989

Change #1126170 merged by Dzahn:

[operations/puppet@production] deployment_server/k8s: set kubeconfig files for codesearch

https://gerrit.wikimedia.org/r/1126170

Dzahn changed the task status from Open to In Progress.Mar 19 2025, 6:18 PM

Change #1126177 merged by Dzahn:

[operations/dns@master] add ingress service aliases for codesearch on k8s-aux

https://gerrit.wikimedia.org/r/1126177

Change #1128988 merged by Dzahn:

[operations/puppet@production] conftool-data: add codesearch service to discovery objects

https://gerrit.wikimedia.org/r/1128988

Change #1128989 merged by Dzahn:

[operations/puppet@production] servicecatalog: add codesearch in state service_setup

https://gerrit.wikimedia.org/r/1128989

Change #1126182 merged by Dzahn:

[operations/dns@master] create k8s-ingress-aux -ro and -rw discovery records, metafo/geodns

https://gerrit.wikimedia.org/r/1126182

Change #1126175 merged by Dzahn:

[operations/deployment-charts@master] create a namespace for codesearch on k8s-aux cluster

https://gerrit.wikimedia.org/r/1126175

Dzahn changed the task status from In Progress to Open.Apr 21 2025, 11:04 PM

In the linked task above I requested (and was granted) more quote on the codesearch cloud VPS project.

I am going to double RAM on the sourcebot1 test instance because it has been swapping and became almost unusable since lately (with a config with all repos and the latest version).

After resizing the instance to 16GB RAM, and upgrading to v4.0.1, now the instance does not swap anymore and the UI is usable and so far does not show any errors.. with the full config that fetches all the > 4000 repos! :))

Created a new gitlab project called sourcebot under repos/sre to hold the docker files for letting CI build the docker image and push it to our registry.

Change #1166037 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/container/codesearch@master] initial commit - add .gitreview file

https://gerrit.wikimedia.org/r/1166037

Change #1166037 merged by Dzahn:

[operations/container/codesearch@master] initial commit - add .gitreview file

https://gerrit.wikimedia.org/r/1166037

Change #1166044 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/container/codesearch@master] add initial blubber .pipeline config and a README

https://gerrit.wikimedia.org/r/1166044

Change #1166044 merged by Dzahn:

[operations/container/codesearch@master] add initial blubber .pipeline config and a README

https://gerrit.wikimedia.org/r/1166044

Change #1166052 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/container/codesearch@master] add blubber skeleton config, use base image nodejs

https://gerrit.wikimedia.org/r/1166052

Change #1166052 merged by Dzahn:

[operations/container/codesearch@master] add blubber skeleton config, use base image nodejs

https://gerrit.wikimedia.org/r/1166052

Change #1167288 had a related patch set uploaded (by Dzahn; author: Dzahn):

[integration/config@master] add pipelines for codesearch-sourcebot

https://gerrit.wikimedia.org/r/1167288

Change #1167289 had a related patch set uploaded (by Dzahn; author: Dzahn):

[integration/config@master] zuul: add config for codesearch-sourcebot

https://gerrit.wikimedia.org/r/1167289

Change #1167290 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/container/codesearch@master] rename build pipelines for sourcebot

https://gerrit.wikimedia.org/r/1167290

Change #1167288 merged by jenkins-bot:

[integration/config@master] jjb: add pipelines for codesearch-sourcebot

https://gerrit.wikimedia.org/r/1167288

Change #1167289 merged by jenkins-bot:

[integration/config@master] zuul: add config for codesearch-sourcebot

https://gerrit.wikimedia.org/r/1167289

Change #1167290 merged by Dzahn:

[operations/container/codesearch@master] rename build pipelines for sourcebot

https://gerrit.wikimedia.org/r/1167290

Change #1169785 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/container/codesearch@master] add variant sourcebot to blubber file

https://gerrit.wikimedia.org/r/1169785

Change #1169785 merged by Dzahn:

[operations/container/codesearch@master] add variant sourcebot to blubber file

https://gerrit.wikimedia.org/r/1169785

Change #1171630 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/container/codesearch@master] use /sbin/tini as entrypoint

https://gerrit.wikimedia.org/r/1171630

Change #1172080 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/container/codesearch@master] Copied the global build ARGs from upstream docker file:

https://gerrit.wikimedia.org/r/1172080

Change #1171630 abandoned by Dzahn:

[operations/container/codesearch@master] use /sbin/tini as entrypoint

https://gerrit.wikimedia.org/r/1171630

Change #1172080 merged by jenkins-bot:

[operations/container/codesearch@master] Copied the global build ARGs from upstream docker file:

https://gerrit.wikimedia.org/r/1172080

Change #1172390 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/container/codesearch@master] add blubber builder config to build zoekt

https://gerrit.wikimedia.org/r/1172390

Yankees199 renamed this task from Graduate codesearch to production to Seed my codesearch to get work done and clean choosing rights for resolved. .Jul 29 2025, 11:01 AM
Yankees199 changed the task status from Open to In Progress.
Yankees199 claimed this task.
Yankees199 set Due Date to Jul 29 2025, 12:00 AM.
Yankees199 edited subscribers, added: ------; removed: A_smart_kitten, Ladsgroup.
Peachey88 renamed this task from Seed my codesearch to get work done and clean choosing rights for resolved. to Graduate codesearch to production.Jul 29 2025, 11:03 AM
Peachey88 changed the task status from In Progress to Open.
Peachey88 reassigned this task from Yankees199 to Dzahn.
Peachey88 removed Due Date which was set to Jul 29 2025, 12:00 AM.
Peachey88 edited subscribers, added: A_smart_kitten, Ladsgroup, Yankees199; removed: ------.

Change #1172390 merged by jenkins-bot:

[operations/container/codesearch@master] add zoekt from upstream and blubber builder config to build it

https://gerrit.wikimedia.org/r/1172390

We have been evaluating software for a refreshed codesearch and https://www.sourcebot.dev/ seems like a viable candidate.

If the software is changed, could redirects be set-up from current-Codesearch query-routes to the equivalent searches using the new software? IMO this would be ideal if possible, in order to avoid breaking a load of links that have been made to Codesearch over the years

So far the plan is to introduce a new codesearch to prod while the existing codesearch stays unchanged. So it would just keep working as it has before. New software would not just show up under existing URLs. But once that is stable and has been running for a while we should revisit. And we won't just shut something down without redirects, yes!

Ah, thank you for the clarification & confirmation! <3

We have been evaluating software for a refreshed codesearch and https://www.sourcebot.dev/ seems like a viable candidate.

Just popping in to voice my concerns here, AI has numerous problems, both with the idea itself, as well as moral, logical, environmental, etc. The Sourcebot service has only been around since 2024, once the AI bubble (inevitably) bursts, how do we know this software will stay around or recieve updates? How do we know it'll keep working if one of the models it relies on shuts down? I would feel considerably more comfortable using a piece of software that has:

  1. Been around for more than a year
  2. Has built up a reputation as an established and well-respected piece of software
  3. Does not cause my swimming pool to dry up

If we absolutely must use this Sourcebot... thing, can we at least look into its reliability, longevity, and address the above issues before just throwing it on prod with no regard for the harm it might do?

@GroupNebula563 I share your concerns about AI tools but this isn't AI-related. It's just a search engine indexing git repos.

@GroupNebula563 I share your concerns about AI tools but this isn't AI-related. It's just a search engine indexing git repos.

The "Ask Sourcebot" functionality seems to be the most heavily advertised feature these days upstream: https://docs.sourcebot.dev/docs/features/ask/overview. I can see how this could confuse folks about the product if they were just seeing mention of it now. Hopefully your statement will help folks relax a bit @Dzahn and realize that we are not evaluating/deploying stack to make the use of "AI agents" and similar LLM driven workflows nicer.

I was a bit surprised to see a WMF logo on their landing page as a testimonial. My understanding in the past has been that it takes quite a bit for us to agree to that sort of advertising for a vendor.

Screenshot 2025-08-19 at 14.13.51.png (638×1 px, 70 KB)

oh, I see where you are coming from now. I would like to reassure you this feature, nor the AI keyword, existed at the time we decided to evaluate it and it was in no way part of that process nor were we planning to enable it.

Dzahn changed the task status from Open to Stalled.Sep 29 2025, 4:48 PM
Dzahn removed Dzahn as the assignee of this task.Feb 6 2026, 3:21 PM